Regression analysis

Assume that you still work for Ms. Deanna V. Ashun (aka “Dee”) and she is now most concerned about finding that set of variables which truly relate to annual salary (e.g, EDUCATION LEVEL probably is correlated with salary, whereas CITIZENSHIP is probably not).

Ms. Ashun has certain suspicions, but is not absolutely sure which variables are the most important in terms of the salaries paid at B&T. She decides to exclude the CEO from all calculations and use the following notation:

y = Annual salary paid in $1000s
x1 = Age in years
x2 = Years of experience prior to B&T
x3 = Level of education; x3 = 1,2,3,4,5
x4 = 0/1 variable for computer usage
x5 = Job classification; x5 = 1,2,3,4,5,6
x6 = Years of experience at B&T
x7 = Gender; 0  Male; 1  Female
x8 = Citizenship (but this is nominal data, so will not be used here)
x9 = Salary adjustor for location; x9 = 1,2,3,4,5

STATS250 PROJECT #3 (Regression Analysis) Page 2

(a) After some thought, she decides that PRIOR, EDUC, GRADE, EXPERI and
GENDER are probably those variables which correlate most highly with
SALARY. Assuming that all relationships are linear (i.e., of the form
E(y) = 0 + 1xi), she asks you to complete the following table (please put your answers on the Yellow Sheet):
(10 points)

Dep. Variable Ind. Variable Prediction Equation R2 value

SALARY
PRIOR
y = _______________ + ________________ * PRIOR

SALARY
EDUC
y = _______________ + ________________ * EDUC

SALARY
GRADE
y = _______________ + ________________ * GRADE

SALARY
EXPERI
y = _______________ + ________________ * EXPERI

SALARY
GENDER
y = _______________ + ________________ * GENDER

(b) 1. Are the sings of the slops as expected? 2. Interpret each slope coefficient: (8 points)

PRIOR:

EDUC:

GRADE:

EXPERI:

STATS250 PROJECT #3 (Regression Analysis) Page 3

(c) Of the five variables, which two have the highest R2 values?
(1 point)

 PRIOR  EDUC  GRADE  EXPERI  GENDER

Of the five variables, which two have the lowest R2 values?
(1 point)

 PRIOR  EDUC  GRADE  EXPERI  GENDER

Now aware that GRADE has the single greatest impact on SALARY, Ms. Ashun wonders what variables influence GRADE. She suspects that greater academic credentials are needed to get promoted to the higher ranks at B&T, and further suspects that this relationship is linear. Thus, for all employees in the sample (excluding the CEO), she asks you to investigate the following model:

GRADE = 0 + 1*EDUC
(d) Get the full regression output for this model.

d1. Specify the final prediction equation:
(3 points)

d2. What percent of the variance in GRADE is due to factors other than
EDUCation?
(3 points)

d3. What is the 95% confidence interval for the slope of your model?
(3 points)

d4. Assuming your reader is Mr. Pellsize (intelligent non-statistician),
explain the numeric values found in part (d3) in one or two sentences.
(3 points)
STATS250 PROJECT #3 (Regression Analysis) Page 4

Ms. Ashun also knows that using this regression model to make predictions about GRADE means that at least there assumptions must be satisfied:

• The errors terms must follow a normal distribution; and
• Error values are statistically independent; and
• The variance of the error terms must be relatively constant.

(e) Generate both a residual plot as well as the normal probability plot.
Please attach these two plots to this paper.
(6 points)

(f) Based on your plots, do you believe that the errors terms follow a normal distribution?
(1 point)

Justify your answer from this part.
(3 points)

(g) Based on your plots, do you believe that the error values are statistically independent?
(1 point)

Justify your answer from this part.
(3 points)

(h) Based on your plots, do you believe that the variance of the error term is constant?
(1 point)

Justify your answer from this part.
(3 points)

c) Two highest R2 values:

 PRIOR  EDUC  GRADE  EXPERI  GENDER

Two lowest R2 values:

 PRIOR  EDUC  GRADE  EXPERI  GENDER

(d) d1. Prediction equation:

GRADE = __________________ + ____________________*EDUC

STATS250 YELLOW SHEET FOR: PROJECT #3 Page 3

d2: Percent variance due to other factors = _________________________

d3. 95% Confidence Interval for slope = ( ____________, ____________ )

d4. Interpretation of (d3):

_____________________________________________________

_____________________________________________________

(e) Generate both a residual plot as well as the normal probability plot.
Please attach these two plots to this paper.

(f) Error terms follow a normal distribution?  YES  NO

Justification of your answer:

_____________________________________________________

_____________________________________________________

(g) Are error values statistically independent?  YES  NO

Justification of your answer:

_____________________________________________________

_____________________________________________________

(h) Error terms show constant variance?  YES  NO

 

 

 

Sample Solution

ACED ESSAYS