The data we will use for this assignment consist of two CSV files: Friends and Ratings. The former
contains friendship relationships as tuples of the form UserID1 UserID2, denoting two users who are
also friends. A user can have one or more friends and the friendship relationship is symmetric: if A is a
friend of B, then B is also a friend of A and both tuples (A B and B A) are present in the file. Ratings
contains user ratings as tuples of the form UserID MovieID Rating. For example, tuple 12 3 4
means that “the user with ID 12 gave 4 stars to the movie with ID 3”.
Hint #1: You can use Python’s CSV reader to parse input files.
Hint #2: Consider encapsulating your Python tuples in ATuple objects (see code skeleton).

  1. TASK I: Implement backward tracing (credits: 40/100)
    The first task is to extend the operators you built in Assignment 1 with support for backward tracing. For
    each operator, you will have to implement a new method (in Python 3 syntax):
    lineage(tuples: List[ATuple]) -> List[List[ATuple]]
    that returns the lineage of the given list of tuples.
    As discussed in Lecture 2, the lineage of an output tuple, let t, with respect to a query q(D) is the
    collection of input tuples that contributed to having the tuple t in the output of the query. Let
    recommendation be the output (movie id) of the second query from Assignment 1:
    SELECT R.MID
    FROM ( SELECT R.MID, AVG(R.Rating) as score
    FROM Friends as F, Ratings as R
    WHERE F.UID2 = R.UID
    AND F.UID1 = ‘A’
    GROUPBY R.MID
    ORDERBY score DESC
    LIMIT 1 )
    To successfully complete this task, you must implement a new method for ATuple:
    lineage() -> List[ATuple]
    so that you can retrieve the lineage of any recommendation as follows:
    lineage = recommendation.lineage()
    3
    Calling recommendation.lineage() should internally call:
    operator.lineage(tuples=[recommendation])
    where operator is a handle to the operator that produced the tuple recommendation (i.e. the root
    operator of the query tree).
    Example: Let’s assume that the recommendation query returns a recommendation r=ATuple(10)
    (where 10 is the movie ID) to user 0 and this recommendation was based on the following input tuples:
    (in Friends.csv) (in Ratings.csv)
    0 1 1 10 5
    0 4 4 10 8
    0 18 18 10 2
    In this case, calling r.lineage()should return the following list of tuples as the lineage of r:
    [(0, 1), (0, 4), (0, 18), (1, 10, 5), (4, 10, 8), (18, 10, 2))]
    Hint #1: This task requires modifying each operator so that it maintains some intermediate data in
    memory (input-to-output mappings). Try minimizing the amount of intermediate data by leveraging
    knowledge about the operator logic.
    Hint #2: Use the track_prov flag to enable and disable materialization of intermediate data.
    Hint #3: Make sure the input to the operator.lineage method is a list of tuples all of which
    exist in the output of the respective operator.
    Hint #4: Calling lineage on an operator with a list of tuples (recommendations) as input must
    return the lineage of these tuples as a list of lists:
    [ [lineage_of_tuple_1], [lineage_of_tuple_2], … ]

Sample Solution

This question has been answered.

Get Answer