Real and synthetic data using your own codes

c data using your own codes, unless otherwise specified. The hope is that, by reinventing the wheels, you will have a thorough understanding of analysis of variance.

You may discuss with your classmates or use others’ code in tackling the problems, but you need to write up the solution independently. If you use code from the instructor, TA, your classmates, or any other sources, you need to acknowledge them in the Acknowledgement section. For this homework, your report is eligible for being nominated as the outstanding homework only if you write your own functions.

Failing to acknowledge the use of others’ code will be counted as plagiarism. The assignment(s) will not be graded, and this incidence might be reported to the Student Judicial Affairs.

You must show your work for credit. Partial credit can only be given if your thoughts can be followed.

The instructor and TA will nominate outstanding reports that are carefully crafted and of high quality. There are no upper or lower bounds for the number of outstanding reports.

QUESTIONS
We follow the notations we used in the lectures when there is no further specification. The one-way ANOVA model with r factor levels is assumed for all questions in this homework.

PART I: (20 POINTS) PRACTICE WRITING R FUNCTIONS.
Some advantages of using functions: avoid repetition of codes, increase program readability, and reduces chances of error.

The functions in this part need to be written without using any of the existing functions in R that perform ANOVA analysis, such as ‘aov()’.

Write a function to construct a (1-α) confidence interval for the factor level mean. Use the function to construct a 95% confidence interval for the population mean of the response rate for questionnaire in color blue in the paper color data set used in Homework assignments 3 and 4 (can be downloaded here).
mean.CI = function(whichlevel, alpha, Y, mylevel, level.names){
# example code: find the Y values at whichlevel
whatevername = Y[which(mylevel==whichlevel)]

# modify the codes for computing L and U (do not use the current codes, they are wrong!)
L = 1
U = 3

# output the interval in a vector
return(c(L,U))
}

try the function on the paper color data set in Homework 3

mydata = read.table(“ColorStudy.txt”, header=T)
color = mydata$Color
response = mydata$ResponseRate
level.names = levels(as.factor(color))
level.names

[1] “blue” “green” “orange” “white”

mean.CI(whichlevel=”blue”, alpha=0.05, Y=response, mylevel=color, level.names=level.names)

[1] 1 3

Write a function to construct a (1-α) confidence interval for a contrast L=∑ri=1ciμi.
contrast.CI = function(coefficients, alpha, method, Bonferroni.g=1, Y, mylevel, level.names){
# coefficients is a vector containing all c_i’s.
# method takes value for “Bonferroni”, “Tukey” or “Scheffe”
# Bonferroni.g is an argument used only when method is “Bonferroni”

# modify the codes for computing multiplier, L, and U (do not use the current codes, they are wrong!)

if (method==”Bonferroni”){
multiplier = Bonferroni.g
}
if (method==”Tukey”){
multiplier = 2
}
if (method==”Scheffe”){
multiplier = 3
}

L = multiplier+1
U = multiplier+3

# output the interval in a vector
return(c(L,U))
}
Part II. (40 points) Data analysis.
We continue to analyze the paper color data set in Homework assignments 3 and 4. Let the four levels be in the order of “blue”, “green”, “orange”, and “white”. Use the function in Part I #2 to compute the confidence intervals in the following questions.

Similar to Homework 4, you could write a second function that essentially does what contrast.CI does while with the paper color data set as the default input to make calling easier.

contrast.CI.pc = function(coefficients, alpha, method, Bonferroni.g=1, Y=response, mylevel=color, level.names=c(“blue”,”green”,”orange”,”white”)){
return(contrast.CI(coefficients, alpha, method, Bonferroni.g, Y, mylevel, level.names))
}
(25 pt) For the following questions, state what procedure you would use (choose the legitimate procedure with the shortest interval length) and explain why you choose the procedure, and compute the confidence interval(s). The (family) confidence level is 0.95 for each of the following question. The following eight contrasts will be used: L1=μ1−μ3, L2=μ2−μ4, L3=μ1+μ22−μ3+μ42, L4=μ1+μ32−μ2+μ42, L5=μ1−μ1+μ2+μ3+μ44, L6=μ2−μ1+μ3+μ43, L7=μ3−μ1+μ2+μ43, L8=μ4−μ1+μ2+μ33.
Suppose you are interested in all possible contrasts, however, you cannot enumerate all of them, so you decide to construct confidence intervals for 8 contrasts: L1,…,L8.
Suppose before seeing the results, you only want to construct confidence intervals for 8 contrasts: L1,…,L8.
Suppose you are interested in all possible pairwise comparisons, however, after seeing the results, you decide to just construct confidence intervals for L1 and L2.
Suppose before seeing the results, you only want to construct confidence intervals for L1 and L2.
Suppose before seeing the results, you only want to construct the confidence interval for the difference between the mean response rate of the color with the highest response rate differ and that with the lowest response rate.
(15 pt) Model diagnostics.
Plot the residual plot and comment.
Conduct the Brown-Forsythe test to check whether the equal variance assumption holds. Compute the test statistic, state the decision rule, and conclude. Use 0.01 significance level.
Plot the normal Q-Q plot and comment.

Sample Solution

ACED ESSAYS