Assume that you still work for Ms. Deanna V. Ashun (aka “Dee”) and she is now most concerned about finding that set of variables which truly relate to annual salary (e.g, EDUCATION LEVEL probably is correlated with salary, whereas CITIZENSHIP is probably not).

Ms. Ashun has certain suspicions, but is not absolutely sure which variables are the most important in terms of the salaries paid at B&T. She decides to exclude the CEO from all calculations and use the following notation:

y = Annual salary paid in $1000s

x1 = Age in years

x2 = Years of experience prior to B&T

x3 = Level of education; x3 = 1,2,3,4,5

x4 = 0/1 variable for computer usage

x5 = Job classification; x5 = 1,2,3,4,5,6

x6 = Years of experience at B&T

x7 = Gender; 0 Male; 1 Female

x8 = Citizenship (but this is nominal data, so will not be used here)

x9 = Salary adjustor for location; x9 = 1,2,3,4,5

STATS250 PROJECT #3 (Regression Analysis) Page 2

(a) After some thought, she decides that PRIOR, EDUC, GRADE, EXPERI and

GENDER are probably those variables which correlate most highly with

SALARY. Assuming that all relationships are linear (i.e., of the form

E(y) = 0 + 1xi), she asks you to complete the following table (please put your answers on the Yellow Sheet):

(10 points)

Dep. Variable Ind. Variable Prediction Equation R2 value

SALARY

PRIOR

y = _______________ + ________________ * PRIOR

SALARY

EDUC

y = _______________ + ________________ * EDUC

SALARY

GRADE

y = _______________ + ________________ * GRADE

SALARY

EXPERI

y = _______________ + ________________ * EXPERI

SALARY

GENDER

y = _______________ + ________________ * GENDER

(b) 1. Are the sings of the slops as expected? 2. Interpret each slope coefficient: (8 points)

PRIOR:

EDUC:

GRADE:

EXPERI:

STATS250 PROJECT #3 (Regression Analysis) Page 3

(c) Of the five variables, which two have the highest R2 values?

(1 point)

PRIOR EDUC GRADE EXPERI GENDER

Of the five variables, which two have the lowest R2 values?

(1 point)

PRIOR EDUC GRADE EXPERI GENDER

Now aware that GRADE has the single greatest impact on SALARY, Ms. Ashun wonders what variables influence GRADE. She suspects that greater academic credentials are needed to get promoted to the higher ranks at B&T, and further suspects that this relationship is linear. Thus, for all employees in the sample (excluding the CEO), she asks you to investigate the following model:

GRADE = 0 + 1*EDUC

(d) Get the full regression output for this model.

d1. Specify the final prediction equation:

(3 points)

d2. What percent of the variance in GRADE is due to factors other than

EDUCation?

(3 points)

d3. What is the 95% confidence interval for the slope of your model?

(3 points)

d4. Assuming your reader is Mr. Pellsize (intelligent non-statistician),

explain the numeric values found in part (d3) in one or two sentences.

(3 points)

STATS250 PROJECT #3 (Regression Analysis) Page 4

Ms. Ashun also knows that using this regression model to make predictions about GRADE means that at least there assumptions must be satisfied:

• The errors terms must follow a normal distribution; and

• Error values are statistically independent; and

• The variance of the error terms must be relatively constant.

(e) Generate both a residual plot as well as the normal probability plot.

Please attach these two plots to this paper.

(6 points)

(f) Based on your plots, do you believe that the errors terms follow a normal distribution?

(1 point)

Justify your answer from this part.

(3 points)

(g) Based on your plots, do you believe that the error values are statistically independent?

(1 point)

Justify your answer from this part.

(3 points)

(h) Based on your plots, do you believe that the variance of the error term is constant?

(1 point)

Justify your answer from this part.

(3 points)

c) Two highest R2 values:

PRIOR EDUC GRADE EXPERI GENDER

Two lowest R2 values:

PRIOR EDUC GRADE EXPERI GENDER

(d) d1. Prediction equation:

GRADE = __________________ + ____________________*EDUC

STATS250 YELLOW SHEET FOR: PROJECT #3 Page 3

d2: Percent variance due to other factors = _________________________

d3. 95% Confidence Interval for slope = ( ____________, ____________ )

d4. Interpretation of (d3):

_____________________________________________________

_____________________________________________________

(e) Generate both a residual plot as well as the normal probability plot.

Please attach these two plots to this paper.

(f) Error terms follow a normal distribution? YES NO

Justification of your answer:

_____________________________________________________

_____________________________________________________

(g) Are error values statistically independent? YES NO

Justification of your answer:

_____________________________________________________

_____________________________________________________

(h) Error terms show constant variance? YES NO