1. (This problem uses the same data as Problem 4 from Assignment 1.) Show all work to receive full credit.
Suppose that a professor is interested in determining how much time her students spend on studying. On a Monday morning, she decided to conduct a survey of her class in which she asked students to list the number of hours they spent on their coursework during the past two days. The numbers reported by each student were as follows:
7, 5, 12, 10, 8, 6, 33, 13, 8, 7, 11, 10, 6, 9, 43, 7, 9, 8, 11, 9, 10, 38, 12, 9, 11.
A. Assume that the data collected by the professor constitute a population, and compute the following: mean, median, mode, range, interquartile range, and semi-interquartile range.
B. Consider again what it is the professor ultimately wants to know. Think about how it is being measured. Look at the data that resulted (you may wish to look over your various answers to part A and at your histogram from Assignment 1). Is there any aspect of these data that you find particularly striking? State what you think it is. Now, offer one reasonable explanation for why you think it occurred. You should be able to do this in one paragraph or less.
C. Which measure of central tendency do you think provides the best summary for these data? Explain your answer in a few sentences.
D. Consider whether there is anything that can be done to these data that would bring the mean more in line with the median. In other words, what change can be made to the data so that the new distribution would have a mean that is closer to its median? (HINT: Think about outliers). Once you think of something, state what you are going to do, then try it out and re-calculate both the median and the mean.
B. Computer Problems
For the following problems, use the data set “2.Asg.attain.dta”. This data set is an extract from the 1998 General Social Survey (GSS). For more information on the GSS (including definitions of the variables), see the following website: http://www.norc.org/GSS+Website/Browse+GSS+Variables/ As you are doing the problems below, copy (as picture) and paste relevant Stata output into a Word document and type your answers to the questions in that document.
2. Using the “2.Asg.attain.dta” data set, calculate the following summary statistics for both the educ (respondent’s highest year of schooling) and paeduc (father’s highest year of schooling) variables. Use the commands “summarize educ, detail” and “summarize paeduc, detail” Provide the Stata output and show how you got each answer.
a. Mean
b. Median
c. Range
d. 25th percentile
e. 75th percentile
f. Interquartile range
3. The variables educ_o and paeduc_o have been recoded from ratio variables in problem 2 to ordinal variables for this problem. These variables have categories “< HS,” “HS,” “Some Coll,” and “College.”
A. Using the tab command and these two variables (educ_o and paeduc_o), make two contingency tables that display the relationship between father’s education and respondent’s education. Think about which variable should go on each axis (HINT: think about temporal order.) In the tab command, list the independent variable first and then the dependent variable (i.e. “tab [independent] [dependent]”). Make one version of the contingency table with row percentages by adding “,row” to the end of the tab command, (i.e. “tab [independent] [dependent],row”) and one version with column percentages by adding “,col” to the end of the tab command (i.e. “tab [independent] [dependent],col”). HINT: You will need to use both of these tables to answer the questions below. Different questions will require you to use different tables.
B. What proportion of fathers with less than a high school education have children who get a college degree?
C. What proportion of fathers with a college degree have children with less than a college degree? (less than a college degree includes the “Some College” category)
D. What proportion of all respondents have less than a high school degree?