Machine Learning (Unsupervised Learning, Tree Based Methods, Support Vector
Machines, Classification, Linear and non-linear Regression, and resampling methods)
The following is an example of the coursework that will be expected to be delivered within 12 hours, This
coursework contains four questions. Answer ALL FOUR. All questions will be given equal weight (25%).Time
allowed – Expected Writing Time: 2 hours (you would have 12 hours to answer)
In this exam is
(a) Suppose that yi ∼ N(µ, 1) for i = 1, . . . , n and that the yi’s are independent.
i. Show that the sample mean estimator ˆµ1 =1/n ∑yi is obtained from
minimising the least squares criterion [7 marks]
µˆsub(1) = argmin.∑(yi-µ)^2, and that ^µsub(1) an unbiased estimator of µ. Also find the variance of ^µsub(1)
ii. Consider adding a penalty term to the least squares criterion, and therefore using the estimator that
minimises µˆ2 = argmin∑(yi-µ)^2+ λ(µ)^2 for the mean, where λ is a non-negative tuning parameter. Derive ˆµ2,
find it bias and show that its variance is lower than that of ˆµ1
Consider the multiple linear regression model yi = β0 + ∑βsub(j)x(sub)ij + e(sub)i, i = 1, . . . , n, j = 1, dots, p,
where β = (β1, …, βp)^T and error-term= (e(sub)1….e(sub)n)^T∼ N(0, σ^2 I(sub)n).
i. When p is comparable to n, the multicollinearity becomes an issue. Describe the effects of multicollinearity on
the estimated coefficients, the
associated standard errors and the significance of the coefficients using the
ordinary maximum likelihood method.
ii. The ridge regression estimate of β can be obtained by minimising a particular expression with respect to β.
Write down this expression as well as
an alternative formulation of it.
iii. Explain why ridge regression can potentially correct the problems of
multicollinearity. [2 marks]
iv. Provide an advantage and a disadvantage of ridge regression over the standard linear regression.
- Let x = (x1, . . . , x100), with ∑xi = 20, be a random sample from the Exponential(λ)
distribution with probability density function given by
f(x(sub)i|λ) = 1/λ exp(−x(sub)i/λ), x(sub)i > 0, λ > 0. Note that E(xi) = λ.
(a) Assign the IGamma(0.1, 0.1) prior to λ and find the corresponding posterior distribution.
(b) Find the Jeffreys’ prior for λ. Which is the corresponding posterior distribution.
(c) Find a Bayes estimator for λ based on the priors of parts (a) and (b)
(d) Let y represent a future observation from the same model. Find the predictive
distribution of y based either on the prior of part (a) or (b).
(e) Describe how you can calculate the mean the of the predictive distribution in
software such as R.
- (a) i. Suppose a non-linear model that can be written as Y = f(X) + e,
where e has zero mean and variance σ^2, and is independent of X. Show
that the expected test error, conditional on X can be decomposed into the
following three parts:
E[(Y − ˆf(X))^2] = σ^2 + Bias [f(x)]^2 + Var [f(x)] , where f(·) is estimated from the training data.
7/22/2020 Order 323199824
ii. To estimate the test error rate, one can use the 10-fold Cross Validation
(CV) approach or the information criterion approach, e.g. AIC, BIC. What
are the main advantage and disadvantage of using the 5-fold CV approach
in comparison with AIC or BIC?
iii. State which one of AIC and BIC tends to select smaller size model and
explain the reason
(b) i. The tree in Figure 1 provides a regression tree based on a dataset of patient visits for upper respiratory
infection. The aim is to identify factors
associated with a physicians rate of prescribing, which is a continuous variable. The variables appearing in the
regression tree are private: percent
of privately insured patients a physician has, black: the percent of black
patients a physician has, and fam whether or not the physician specialises
in family medicine. Provide an interpretation of this tree.
ii. Consider the regression tree of Figure 2 where the response variable is the
log salary of a baseball player, based on the number of years that he has
played in the major leagues (Years) and the number of hits that he made
in the previous year (Hits). Create a diagram that represent the partition
of the predictors spaces according to this tree
4 (a) i. Consider the following data: 10 20 40 80 85 121 160 168 195.
Use the k-means algorithm with k = 3 to cluster the data set. Use the
Euclidean distance to measure the distance between the data points. Suppose that the points 160, 168, and
195 were selected as the initial cluster
means. Work from these initial values to determine the final clustering for
the data. Provide results from each iteration.
ii. What are the main disadvantages of k-means clustering? Why one may
want to consider hierarchical clustering as an alternative?
(b) i. Data are available for students taking BSc degree in Data Science and
in particular the variables X1: average mark on project coursework, X2:
average hours studied per course, and Y : get a degree with distinction. The
estimated coefficients of a logistic regression model were β0 =?5, β1 = 0.02,
β2 = 0.1. Estimate the probability that a student who takes on average
50% on project coursework and studies 30 hours on average for each course
gets a degree with distinction? How many hours would the student in part
(a) need to study on average to have a 50 % chance of getting a degree
with distinction ?
ii. Suppose that we wish to predict whether a high quality chip produced in
a factory will pass the quality control (‘Pass’ or ‘Fail’) based on x, the
measurement of its diameter. Diameter measurements are available for a
large number of chips. After examining them it turns out that the mean
value of x for chips that passed the quality control was 5mm, while the
mean for those that didn’t was 7mm. Moreover, the variance of x for
these two sets of companies was σ^2 = 1. Finally, 70% of the produced
chips passed the quality control. Assuming that x follows the normal
distribution, predict the probability that a chip with x = 5.8 will pass the
The Community Neighborhood people group ought to follow government laws and homegrown guidelines. However long they meet the prerequisites of these chances, everybody has equivalent open doors for a mind-blowing duration. Individuals who are qualified and give a valiant effort for the network lead the network. Society depends on unrestrained choice, except if rules are watched and nobody is harmed. Rules and guidelines are intended to secure individuals of the general public so as to construct a protected and agreeable network. There are two fundamental kinds of network networks, networks characterized by topographically characterized networks and geologically various factors. Instances of geologically characterized networks are networks or local locations. The accumulated houses share a typical space. Nearby economies and territorial organizations are likewise viewed as a feature of the neighboring network. Individuals from the network are invested individuals in network wellbeing, and I am keen on this What is a network? Parker characterizes the network as individuals who share a typical space or district, share a typical relationship, connect with one another, and are generally utilized in the writing (Fritz, 1985). The people group comprises of shared commitments and connections connecting people to organizations and relationship organizations. In most topographical networks, there is a typical association and a pretty much free network. Schools, law implementation offices, and social government assistance offices are three instances of formal associations inside the network that are especially significant for substance misuse. Guardians and understudy associations vary in structure, normal to most networks, and can assume a significant part in forestalling drug misuse. Local gatherings and individuals from the association cooperate to make, characterize, and take care of social issues by interfacing with onCommunication of E.coli Outbreak in the Community Extension of the event of E. coli in the network As a human administrations manager in the event of contamination, I will take all fundamental measures to speak with youngsters, guardians and clinical staff in a convenient and compelling manner take. My main concern is to detach illnesses however much as could reasonably be expected and to permit those effectively contaminated to get the treatment they need (Graham-Clay, 2005). On the off chance that ailments, for example, Escherichia coli happen in a neighborhood secondary school, we will quickly educate the clinical staff. Escherichia coli diseases from food cause genuine ailment, which can prompt genuine ailment and now and then demise. Since treatment is restricted, counteraction by overseeing food preparing is imperative to control the event of these illnesses. Notwithstanding, E. coli is profoundly flexible, versatile, can convey and move DNA, makes new strains and is impervious to current preparing and control techniques. Further examination is required here before unlimited oversight is acquired. Intestinal hemorrhagic E. coli is a perceived illness because of the event of a monetarily accessible cheap food burger. Side effects of bacterial contamination by E. coli incorporate extreme the runs, typically blood, stomach agony and regurgitating. Youngsters younger than 4 are bound to build up this sickness. Related nourishments incorporate uncooked hamburger, unsterilized milk, juice. Polluted water can be brought about by fecal defilement of individuals who have microorganisms. Indeed, even disinfectants utilized in certain items will most likely be unable to successfully decimate microbes. In this way, it is suggested that individuals with inabilities in the safe framework and other high-hazard bunches abstain from eating sprouts. Chloride-treated fledglings are still connected with the improvement of E. coli and Salmonella (Taormina and Beuchat 1999). Treatment incorporates steady consideration and checking of renal capacity (CDC, 2001). Much of the time, antibodies are contraindicated. The outcome of kidney disappointment might be passing MRSA contamination was initially restricted to clinics and unique nursing home, particularly individuals who are feeble in the insusceptible framework. Since the 1980s, network type cases and pestilences have additionally been accounted for. The cases got in the network are cases not identified with hospitalization or dialysis, medical procedure or catheterization in the previous year. These contaminations typically happen in other sound individuals and are probably going to be restricted to skin diseases. In any case, over the previous decade, the expanded pathogenicity of MRSA microbes has brought about more genuine, once in a while fatal network diseases. As of late, MRSA has been found in palatable creatures, and a few flare-ups are "food initiated" or foodborne. In one such episode, contaminated individuals created common side effects of foodborne ailments, for example, regurgitating and stomach cramps.e another. Declining Drug Abuse: Sociological Strategy for Community Social Practice W. David Watts South West Texas State University Systems for network based sociological practice are examined. With regards to social development of practical hypotheses, the part of sociologists in helping networks comprehend social issues has been broke down. When the network acknowledges young people 'substance misuse issues, the hypothesis of control and partners' affiliation can lead sociologists who wish to work with nearby pioneers to battle drug misuse. Network association, reinforce the association among guardians and different gatherings, bring down the network's capacity to bear drug misuse, construct uphold for peer anticipation Declining Drug Abuse: Sociological Strategy for Community Social Practice W. David Watts South West Texas State University>GET ANSWER