PART 1

Using the attached files of around 3200 tweets per person, show a histogram (frequency distribution) of the tweets of both Dave and Julia. Use UTC to create the time stamp. Remember that the case of column headers matters.
Make a dataframe of word frequency for each of Dave and Julia. Plot the frequencies against each other. Include a dividing line in red showing words nearby that are similar in frequency and words more distant which are shared less frequently.
Create a stacked chart comparing the odds ratios of the top 15 words used by each tweeter. Remove twitter handles from the list of words. Calculate the word usage ratios (usage v. total) and display it on a log scale. Do you notice any interesting differences? Does anything stand out as a difference?
PART 2

Using the tweet files:

Create time series charts for each tweeter showing how word usage has changed over time. Show for three words. You may have to manipulate a parameter to show Comment your code, line by line.
Show a graph for each tweeter revealing the ten words with the highest number of retweets. Comment your code, line by line.

Sample Solution

This question has been answered.

Get Answer