1. Do a ggpairs plot which displays all the pairwise relationships among the variables except item and type. Comment on the relationships you see, especially those between calories, which will be the outcome variable in our regression analyses, and the other variables.
    1
  2. Create a plot where the x-axis is carb and the y-axis is calories. Color-code the points according to type (the points for bakery should be of one color, and the points for other should be of another color). Add in linear trends for bakery and other as well. Do you think that the line is a good fit? Why or why not? Do you think that the effect of carb on calories depends on type? Why or why not?
  3. Fit a model that predicts calories given all the variables except item and type. Perform backward selection according to adjusted R2. What is the adjusted R2 of your final model? Which variables are kept? You’ll use the final model you select in this question for the rest of the questions, unless otherwise specified.
  4. Interpret the model you selected in part 3. More specifically, interpret the coefficients of the model, and include effect_plots for numerical variables as well. Try to interpret the coefficients using in-context knowledge as much as possible.
  5. Identify the observation with the highest absolute residual. Why is this the most badly predicted observation?
  6. Predict the calories of these new items in McDonald’s, using the model you fitted in part 3.
    • Big Mac: fat = 28, carb = 45, fiber = 3, protein = 24, type = other.
    • Cheeseburger: fat = 11, carb = 33, fiber = 2, protein = 15, type = other.
    • McDouble: fat = 17, carb = 34, fiber = 2, protein = 21, type = other.
    • Chocolate chip cookie: fat = 7, carb = 22, fiber = 1, protein = 2, type = bakery.
    The actual calories of the items are 520, 290, 370, and 160, respectively. What do you think about the model you fitted? Do you think it’s doing a good job? Explain why or why not.
  7. Add in type as an additive effect to your model (that is, no interaction yet). Interpret the coefficient that is related to that variable. Did the model improve, based on adjusted R2?
  8. Now, add in an interaction term between type and one of the numerical variables in the model you selected in part 3. (any variable will do). Interpret the coefficient of the interaction in context and provide an interact_plot. Interpret the plot as well.
  9. A dietitian claims that each gram of fat has 9 calories, each gram of carbs has 4 calories, and each gram of protein has 4 calories. Does your model agree with that statement? Why or why not?

Sample Solution

This question has been answered.

Get Answer