1.1) For the data in the following table, construct a bar chart and a pie chart showing thevoting shares for the candidates.
Candidate Votes

A 6000
B 3000
C 800
D 1200
E 1000
total 12000

1.2) For the data in the following table, construct a histogram and the frequency polygon.

Class Frequency Frequency Density
200 – 240 40
240 – 260 10
260 – 280 20
280 – 320 20
320 – 340 40
1.3) Using the example of temperature (or another suitable variable of your choosing), explain the difference between the three quantitative levels of measurement. (1)

1.4) In a local election, Candidate A received 5400 votes, Candidate B received 3600 votes, Candidate C 7000 votes, and Candidate D 4000 votes. Show the corresponding bar chart for the candidates’ respective percentage share of the total vote. (0.5)

1.5) Car sales in the metropolitan region for the month of April were as follows: Model Alpha 600, Model Beta 1200, Model Gamma 1000, Model Delta 900, Model Epsilon 1300. Show the corresponding pie chart for this data, for the percentage share of sales for each model. (0.5)

1.6) Construct a frequency polygon for the following data set. (1)

Class Frequency
10 – 20 10
20 – 30 5
30 – 50 12
50 – 55 10
55 – 60 10
2.1) Determine mean, median, and mode of the following data set. Also, use two different ways to -calculate the values of the first and third quartile. (1)
11, 9, 16, 11, 12, 16, 11, 11, 7, 12, 19, 14, 9

2.2) Calculate the mean and the median for the data in the following table. (1)
Class Frequency
11 – 15 3
16 – 20 7
21 – 25 10
26 – 30 6
31 – 35 5

2.3) For the following data set: 3, 7, 9, 7, 11, 4, 9, calculate variance and standard deviation. What is the coefficient of variation? (1)

2.4) Calculate variance and standard deviation for the following data set: 162, 726, 188, 656, 165, 547, 175, 806, 190, 670, 145, 810, 169, 682, 149, 304, 197, 847. What is the coefficient of variation? (1)

2.5) Based on the following five values describing a data set, can you make a tentative statement about the skewness of the observations? (0.5)
Minimum: 7, Maximum: 90, first quartile Q1: 13, third quartile Q3: 63, Median: 32

2.6) Determine the mean, median, and mode of the following data set. (0.5)
5, 7, 11, 12, 13, 15, 7, 7, 5, 11, 7

2.7) Determine the mean, median, and mode of the following data set. (0.5)
9, 9, 6, 4, 4, 6, 9, 6, 4, 9, 6

2.8) The average test scores for five different classes are 85 (class 1, 20 students), 80 (class 2, 40 students), 83 (class 3, 30 students), 87 (class 4, 20 students) and 90 (class 5, 40 students). What is the average test score of all the students in those five class? (0.5)

2.9) Fill in the table. What is the average for the variable observed? Why can we expect that the average calculated in this way, and an average calculated based on the individual sample observations will differ slightly? Also calculate the median for this data set. (1)
Class
Frequency (f) Midpoint (M) f*M
18 – 22 5
23 – 27 5
28 – 32 2
33 – 37 3

2.10) Fill in the table. What is the average for the variable observed? Why can we expect that the average calculated in this way, and an average calculated based on the individual sample observations will differ slightly? Also calculate the median for this data set. (1)
Class
Frequency (f) Midpoint (M) f*M
18 – 22 7
23 – 27 12
28 – 32 8
33 – 37 5

2.11) For the following data, calculate the first quartile and third quartile, respectively, in two different ways. What is the median value for this data set? (1)
1, 3, 5, 9, 13, 15, 19, 23, 27, 33, 35, 37, 41

2.12) Fill in the table. What is the average for the variable observed? What is the median? (1)
Class
Frequency (f) Midpoint (M) f*M
1 – 5 8
6 – 10 14
11 – 15 20
16 – 20 16
21 – 25 10
26 – 30 4

2.13) What is the range for this data set? What is the interquartile range? (Choose one possible way to calculate the interquartile range.) (1)
5, 5, 7, 9, 11, 13, 17, 23, 24, 24, 27, 29, 33

3.1) Consider the following situation: Draw marbles randomly three times, without replacing them after a draw. There are 20 marbles overall, 8 of which are yellow, and 12 of which are green. What does the probability tree diagram to capture this setting look like? What are the respective probabilities for the events (y, y, gr) and (gr, gr, y) if the order in which the marbles are drawn does not matter? What are the respective probabilities if the order does matter? (1)

3.2) Fill in the blanks in the table. Over the last year, enrollment numbers, by gender, at a small school are as shown in the table below. For a randomly drawn student, what is the probability that the student is, (1)
a) female
b) a sophomore
c) a male sophomore
d) a female sophomore or junior
e) limiting the population to the male students, a sophomore?

Freshman Sophomore Junior Senior Total
Female 150 90 93 433
Male 105 85
Total 279 168

3.3) An unbiased coin and a fair die are tossed together. (0.5)
a) What is the probability of obtaining heads and a six?
b) What is the probability heads or tails and a three?
c) What is the probability that the coin shows tails?
d) Throwing the die twice, what is the probability of heads and the sum of the die casts being seven?

3.4) Calculate the following probabilities for two dice being cast: (0.5)
a) a 3 on the first die and a 5 on the second one,
b) a 3 on the first one or a 5 on the second one,
c) a sum of 8,
d) a sum of 7 or 8, if one of the die shows a 3.

3.5) Show a diagram to represent the following situation: (0.5)
Event A contains all even integers between 1 and 10 (both included) and Event B contains all integers larger than 5.

3.6) Drawing from a 52-deck of cards, what are the odds that: (0.5)
a) a card is red and a king,
b) a card is black or a queen,
c) neither black nor a queen,
d) a specific suit, say, spades?

3.7) A bowl contains four yellow chips and five black one. Drawing twice, without replacing the chips after drawing, what are the probabilities of (0.5)
a) the first chip being yellow and the second one being black,
b) the first chip being black and the second one yellow,
c) both chips being yellow,
d) both chips being black.

3.8) A bag contains four blue chips and six pink ones. As one experiment, three chips are drawn without replacement. For X taking the values of 0, 1, 2, 3 blue chips, show the probability distribution of X. How many times, out of 75 repetitions of the experiment, how often do you expect two or more blue chips being drawn? (1)

3.9) Why can we say that statistically independent and mutually exclusive events represent different situations? (1)

3.10) You are given the following information: Events A and B cover the entire sample space. The probability of A and B occurring at the same time is 1/4. If B occurs, the probability that A also occurs is 1/3; put differently, given that B has occurred, the probability that A occurs is 1/3. (1.5)
a) What is the probability that only B occurs?
b) What is the probability that only A occurs?
c) What is the probability that B occurs, given that A has occurred?

4.1) For X ~ N (6, 4), give P ( x=7) and P ( x=7). (1)
P ( x=7) =
P ( x=7) =

4.2) What is the Z value that corresponds to the critical value in question 1)? (0.5)
Z =

4.3) What are values of the mean and the standard deviation of a standard normal distribution? (0.5)

4.4) For a likelihood of success p = 0.4, what is the likelihood of two or fewer successes out of five attempts? What is the likelihood of between two and four successes (both included)? (1)
Two or fewer:
Between two and four:

4.5) Give the complete distribution for a binomial distribution, for five repetitions. (0.5)

4.6) What is the likelihood of exactly 2 trains arriving in a train station in a timespan of five minutes, when the average number of trains arriving over such an interval is 3? What is likelihood of 2 or fewer train arriving? (1)
2 trains:
2 or fewer:

4.7) For X~N (72, 16), give a) P ( x=70), b) P ( x=65), and c) P ( 62=x=73). (1)
a) P ( x=70) =
b) P ( x=65) =
c) P ( 62=x=73) =

4.8) For X~N (59, 25), which Z values correspond to a) x=54, b) x =61.5, and c) x=69? (1)
a) Z =
b) Z =
c) Z =

4.9) For X~N (15, 9), give a) P ( 10=x=17) and b) P ( 8=x=20). (1)
a) P ( 10=x=17) =
b) P ( 8=x=20) =

4.10) The lifespan of a certain type of car battery is normally distributed with a mean of 1248 days and a standard deviation of 185 days. If the supplier guarantees them for 1080 days, what proportion of batteries will be replaced under the guarantee? If the supplier wants to replace no more than 10% of the batteries under the guarantee, for how many days will they extend their guarantee? (1.5)

4.11) For a likelihood of success p = 0.6, what is the likelihood of three or fewer successes out of five attempts? What is the likelihood of between three and six successes (both included) out of seven attempts? (1)
a) Three or fewer out of five:
b) Between three and six out of seven:

4.12) For a likelihood of success p = 0.3, what is the likelihood of four or more successes out of six attempts? What is the likelihood of two successes? (1)
a) Four or more:
b) Two:

4.13) Give the complete distribution for a binomial distribution, for six repetitions. (0.5)

4.14) What are the binomial coefficients in the complete distribution for 7 repetitions? (0.5)

4.15) Attendance at a cinema has been analyzed and shows that audiences consist of 60% men and 40% women. If a random sample of six people was selected from the audience during a performance, find the following probabilities: a) The sample consists of six women; b) There are more than three men in the sample; c) There are fewer than three women in the sample. (1)
a) Six women:
b) More than three men:
c) Fewer than three women:

4.16) A quality control system selects a sample of three items from a production line. If one or more is defective, a second sample is taken (also of size three), and if one or more of these are defective, the whole production line is stopped. Given that the probability of a defective item is 0.05%, what is the probability that the second sample is taken? What is the probability that the production line is stopped? (1)

4.17) What is the likelihood of 5 trains arriving in a train station in a timespan of ten minutes, when the average number of trains arriving over such an interval is 8? What is likelihood of 3 or fewer train arriving? (1)
a) 5 trains:
b) 3 or fewer:

4.18) If a two yard piece of carpet shows three weaving errors on average, what is the likelihood that there will be between 3 and 5 (both included) errors in a two yard stretch? More than 4 errors? (1)
Between 3 and 5:
More than 4:

4.19) For a Poisson distribution, calculate the distribution for ? = 3, for values up to x = 7. (1)

4.20) A factory estimates that 0.25% of its production of small components is defective. These are sold in packets of 200. Calculate the percentage of the packets containing one or more defective units. (1)
5.1) The historical output by employees is a mean of 120 units per hour with a standard deviation of 28 units. A new employee is tested on 30 different random occasions and is found to have an output of 107 units per hour. Does this indicate that the new employee’s output is significantly different from the population mean output? (alpha = 0.05) (1)

5.2) A mobile phone company is concerned at the lifetime of phone batteries supplied by a new supplier. Based upon historical data this type of battery should last for 900 days with a standard deviation of 150 days. A recent, randomly selected sample of 40 batteries was selected and the sample battery life was found to be 942 days. Is the sample battery life significantly different from 900 days (at the 5% significance level)? (1)

5.3) Boys of a certain age are known to have a mean weight of µ = 85 pounds. A complaint is made that the boys living in a municipal children’s home are underfed. As one bit of evidence, n = 25 boys (of the same age) are weighed and found to have a mean weight of 80.94 pounds. It is known that the population standard deviation s is 11.6 pounds. Based on the available data, what should be concluded concerning the complaint? (For an alpha of 0.05.) (1)

5.4) A school principle claims that the students in her school are of above average intelligence. A sample of 24 students has scored an average of 107.5 IQ points. The population average is 100, with a standard deviation of 15. Do we have sufficient evidence to support her claim? (At a 5% significance level.) (1)

5.5) In a vocabulary test, the mean score was 68 with a standard deviation of 13. A class of 19 has a mean score of 65. At a significance level of 5%, can we reject the notion that this particular class is a typical one? (1)

5.6) If you know that the average weight of the people in a population is 160 lbs, and the variance is 25, what is the probability that the average weight in a sample of 16 people is 165 lbs or higher? What is the probability that the average weight in a sample of 36 is 162.5 lbs of higher? What is the probability that the average weight will be between 158 and 161 lbs? (1)
a) =165lbs average (sample of 16):
b) =162.5lbs average (sample of 36):
c) P(158 = x = 161):

5.7) For the population in question 5.6), which is the lower bound for the interval that contains the highest 5% of values? The upper bound for the interval that contains the lowest 10% of values? (1)

5.8) If we know that 30% of students at DU own a bike, what is the likelihood that at least 33% in a random sample of 50 own a bike? What is the likelihood that at least 40% in a sample of 30 own a bike? (1)
33% (sample of 50):
40% (sample of 30):

5.9) What is an unbiased estimator? Also, give one example for one. (0.5)

5.10) When you estimate the confidence interval for the population mean, which cases do you have to distinguish, and how do they influence your approach? (1)

5.11) For sample data, 1.01, 1.012, 1.008, 1.015, 1.013, 1.011, and 1.009 give an estimate for the population mean, population standard deviation, and standard error of the mean.(1)

5.12) For the following sample, what is the 95% confidence interval for the population mean: 25.6, 19.8, 22.3, 24.1, 18.7, 21, 20.5, 19.8, 22.7, 23.2 (n=10)? What is the 90% confidence interval? (1)

5.13) For the following sample, what is the 95% confidence interval for the population mean, if the population variance is known to be 9: 25.6, 19.8, 22.3, 24.1, 18.7, 21, 20.5, 19.8, 22.7, 23.2, 21.8, 22(n=12)? What is the 99% confidence interval if the population variance is unknown? (1)

5.14) 33% of the students in a random sample of n = 69 own University of Denver merchandise. Determine the 90% confidence interval for the population proportion of DU merch owners among the student population. (0.5)

5.15) For X~N(160,1296), give the probability that a sample average will be larger than 165, if the sample size is 64. Also, give the standard error of the sample mean. For X~N(160,625), give the probability that a sample average will be larger than 165, if the sample size is 64. Also, give the standard error of the sample mean. Briefly explain the difference in the results compared to the first distribution in this question. (2)

6.1) Out of 19 rats, 12 were fed a high protein diet, the other 7 were fed a low protein diet. Their weights after twelve weeks are
High protein: 134, 146, 104, 119, 124, 161, 107, 83, 113, 129, 97, and 123
Low protein: 70, 118, 101, 85, 107, 132, 94.
a) Formulate a suitable null hypothesis and test it assuming equal variances for a high protein diet population and a low protein diet population. (Significance level: 5%) (1)
b) Formulate a suitable null hypothesis and test it assuming unequal variances for a high protein diet population and a low protein diet population. (significance level: 5%) (1)

6.2) The calorie content of beef and poultry hotdogs has been tested, and yielded the following measures:
Beef: 186, 181, 176, 149, 184, 190, 158, 139, 175, 148, 152, 111, 141, 153, 190, 157, 131, 149, 135, 132
Poultry: 129, 132, 102, 106, 94, 102, 87, 99, 170, 113, 135, 142, 86, 143, 152, 146, 144
Formulate a suitable null hypothesis to assess your belief that the average calories of poultry hotdogs is lower than that of beef hotdogs.
a) Test your hypothesis assuming equal variances for the two populations. (1)
b) Test your hypothesis assuming unequal variances for the two populations. (1)

6.3) What is the difference between a one sample and a two sample test? What do you further have to address if you are drawing on two samples for your test? (1)

6.4) Which are the two different kinds of mistakes that we can make in a hypothesis test? How do the probabilities for these mistakes relate, in general? (1)

6.5) Employees at a firm produce units at a rate of 125 per hour, with a standard deviation of 25 units per hour. A new employee is tested on 35 separate random occasions, and is found to have an average output of 116 units per hour. Does this indicate that the new employee’s output is significantly different (at the 5% level) from the average output in the firm? (1)

6.6) Based upon collected data, a local car dealer has estimated that the amount spent on extras in cars is normally distributed with an average of \$2,250 per customer. She is interested in the question, whether lately, that average may have changed. Data concerning the purchase of extras by the last nine customers shows the following amounts were spent (in \$): 2350, 2486, 1935, 1524, 3221, 2489,1790, 1866, and 2247.
a) Test whether the amounts spent on extras have, in fact, changed, or not. (1)
b) What is the statistical power of the test if the average has changed, and is, in fact, \$2,400 per customer, now? (1)

6.7) Calculate the critical values for a test, for a significance level of 1% and a sample size of 17, for (a) two tail, (b) upper one tail, (c) lower one tail test. (1)

6.8) For testing the effectiveness of a drug, 900 subjects receive an oral administration of the drug in question. Another 1000 subjects receive a placebo. What is a suitable null hypothesis in this case?
You can evaluate whether the drug has an effect on the test subjects by using a standardized scale. For the group having received the drug in question (n=900), the mean test score was 9.78, with a standard deviation of 4.05. For the control group (n=1000), the mean test score was 15.10, with a standard deviation of 4.28. At the 0.1% significance level, can you reject your null hypothesis? (1)

6.9) Pollsters try to assess whether the support for a candidate in a regional election differs in different areas. They randomly interview people in urban and rural environments in that region to find an answer to that question. Of 250 people interviewed in urban areas, 145 indicate that they will vote for said candidate. In rural areas, 90 out of 190 people signal their intent to vote for the candidate. At the 5% significance level, do we accept or reject a null hypothesis stating that support for the candidate is not significantly different in different parts (that is, rural and urban ones) of the region in question? (1)

6.10) A university finance department would like to assess whether travel expenses claimed by members of different departments are significantly different. Having identified two departments that appear to show rather different reimbursement claims, data for these departments over the last calendar year is assessed.
Individual claims, Department A: 156.67, 169.81, 130.74, 158.86, 146.81, 143.69, 155.38, 170.74, 147.28, 157.58, 179.89, 140.67, 154.78, and 154.86.
Individual claims, Department B: 108.21, 142.68, 135.92, 109.10, 110.93, 132.91, 127.16, and 124.94.
a) Assuming that population expenses are approximately normally distributed, and that population variances are approximately equal, test whether department A’s claims are significantly higher than department B’s claims, at the 5% significance level. (1)
b) Undertake the same test assuming that the population variances are unequal. (1)

6.11) Out of 22 rats, 13 were fed a high protein diet, the other 9 were fed a low protein diet. Their weights after twelve weeks are
High protein: 134, 146, 104, 119, 124, 161, 107, 83, 113, 129, 97, 101, and 113.
Low protein: 70, 118, 101, 85, 107, 132, 94, 112, and 105.
Formulate a suitable null hypothesis and test it assuming unequal variances for a high protein diet population and a low protein diet population. (significance level: 5%) (1)

6.12) Is there a significant difference between the test scores of the following two groups of students?
Sample F: n = 8, average = 97.25, SD sample F = 3.65
Sample M: n = 12, average 87.25, SD sample M = 9.6
Formulate a suitable null hypothesis and test it assuming unequal variances. (significance level: 5%) (1)

6.13) Show that the t-statistic you calculate to test your null hypothesis of different population means in two samples, will have the same value in cases of equal and unequal population variances if the two samples have the same size (number of observations). (1)

6.14) Water samples were taken to assess the concentration of trace metals in a body of water; in this case, the concentration of zinc was of interest. Water samples were taken at ten different locations. Does the data suggest differences in the concentration at the bottom (top row) and surface (bottom row)? (Are these samples dependent or independent?) (1)

L 1 L 2 L 3 L 4 L 5 L 6 L 7 L 8 L 9 L 10
.43 .266 .567 .531 .707 .716 .651 .589 .469 .723
.415 .238 .390 .410 .605 .609 .632 .523 .411 .612

6.15) To assess whether studying times among different groups of students vary significantly, members of two different groups were asked about their weekly studying times. Test whether the variances of the two respective populations can be assumed to be the same. (Alpha = 0.05.) (1)
Group 1 (hours per week): 26, 25, 43, 34, 18, 52, 17, 29.
Group 2 (hours per week): 23, 30, 18, 25, 28, 19, 31.

7.1) What does a scatter plot show? What can you say about the relationship of the variables shown in such plots? (0.5)

7.2) What does a covariance show? Calculate the covariance for the data in the following table. What does the sign of your result indicate? (1)

Employee Production volume in t-1 Percentage change in production between t-1 and t
1 47 4.2
2 71 8.1
3 64 6.8
4 35 4.3
5 43 5.0
6 60 7.5

7.3) a) What does a correlation coefficient show? (0.5)
b) For the data in question 7.2, calculate Pearson’s correlation coefficient. (Include the intermediate results you need for the calculation in your answer (averages, variances, etc.).) (1)
c) What does your result indicate regarding the association of the two variables? (0.5)
d) Is the result significantly different from 0 (zero) at the 10% level? (1)

7.4) a) For the data below, calculate Pearson’s correlation coefficient. (Include the intermediate results you need for the calculation in your answer (averages, variances, etc.).) (1)
b) What does your result indicate regarding the association of the two variables? (0.5)
c) Is the result significantly different from 0 (zero) at the 5% level? (1)

Employee Production volume in t-1 Percentage change in production between t-1 and t
A 56 5.7
B 67 5.7
C 57 5.4
D 69 7.5
E 54 5.9

8.1) What does a time series plot show? What s the difference between univariate and multivariate plots? (0.5)

8.2) Show the plots for the following data. What difference can you observe between the two time series data sets? (1)

Period Series 1 Series 2
1 8 15
2 25 20
3 15 13
4 22 15
5 15 18
6 30 22
7 27 15
8 20 18
9 27 14
10 32 17

8.3) Fill in the missing data. (1)

Year Average price of oil in \$/bbl Index Year base 1985 = 100 Index Year base 1987 = 100
1985 \$26.92
1986 \$14.44
1987 \$17.75
1988 \$14.87
1989 \$18.33

8.4) Fill in the missing data. (1)

Year Average price of oil in \$/bbl Index Year base 2003 = 100 Index Year base 2005 = 100
2002 \$22.81
2003 \$27.69
2004 \$37.66
2005 \$50.04
2006 \$58.30

8.5) Fill in the missing data. (1.5)

Year Average price of oil in \$/bbl Consumer price index base, 1987 = 100 Oil price in 1987 \$ value
1985 \$26.92 94.7
1986 \$14.44 96.5
1987 \$17.75 100
1988 \$14.87 104.1
1989 \$18.33 109.1