Task II: Implement Where-provenance query (credits: 10/100)
The second task is to extend your operators with support for Where-provenance (cf. Lecture 2). For each
operator, you will have to implement a new method (in Python 3 syntax):
where(att_index: int, tuples: List[ATuple]) -> List[List[Tuple]]
that returns the Where-provenance of the attribute at index att_index for each tuple in tuples.
4
Let t[a] be an attribute of a tuple t in the output of a query q(D). As discussed in Lecture 2, the
Where-provenance of t[a] is the list of attributes of the input tuples whose values contributed to t[a]’s
value. Let prediction be the output (average rating) of the first query from Assignment 1:
SELECT AVG(R.Rating)
FROM Friends as F, Ratings as R
WHERE F.UID2 = R.UID
AND F.UID1 = 'A' AND R.MID = 'M'
To successfully complete this task, you must implement a new method for ATuple:
where(att_index: int) -> List[Tuple]
so that you can retrieve the Where-provenance of any ‘likeness’ prediction as follows:
where_from = prediction.where(att_index=0)
Calling prediction.where(att_index=0) should internally call:
operator.where(att_index=0, tuples=[prediction])
where operator is a handle to the operator that produced the tuple prediction (i.e. the root
operator of the query tree). The result where_from should be a list of tuples of the form:
[ (input_filename, line_number, tuple, attribute_value) ]
Example: Let’s assume that the prediction query returns a predicted rating r=ATuple(5.0) for movie
10 and this prediction depends on the average ratings in three input tuples:
1 10 5 (line 4 in Ratings.csv)
4 10 8 (line 12 in Ratings.csv)
18 10 2 (line 122 in Ratings.csv)
In this case, calling r.where(att_index=0) should return the following list of tuples:
[(‘Ratings.csv’, 4, (1, 10, 5), 5), (‘Ratings.csv’, 12, (4, 10, 8),
5), (‘Ratings.csv’, 122, (18, 10, 2), 2)]
Sample Solution