Q2) (20 points) Data Science is one of the most popular areas where Python is widely used. In this question, you will have an opportunity to put your tiny tiny first baby step into this territory. You will also develop your own simple hash table using the existing standard Python data types.

Modify the LCS recursive version to count the number of recursive calls for each 6-digit integer string against “0123456789” string. The function returns a tuple of two elements (LCS num, the number of recursive calls to find the LCS)
I think I found a pretty good (but very slow) hash function of integer strings, which is the recursive LCS function. 🙂 Use this function as your hash function to store the integer strings you read in Q1) into your own hash table. Please do not use Python dict() data type directly. You should develop your own hash table where the keys are the number of recursive calls as computed from the recursive LCS function.
Now, regarding how good the new hash function is in terms of generating keys uniformly, we want to check it by counting the number of collisions for each and every key in the hash table from 1M integers. It would be very ideal if the average collision number is close to 100 for 10,000 buckets out of 1M numbers. We can get an idea of it by computing the average number of collisions, but we may also visualize the distribution of the collisions across all the keys using Python plot library.
Use a plotting library (https://plot.ly/python/ (Links to an external site.)) to visualize the distribution of key collision of the LCS hash function:

Sample Solution

This question has been answered.

Get Answer