Section A: Discussion Questions

  1. Explain the confusion matrix in classification methods and provide an example on how you interpret
    the its number?
  2. Give two practical examples on applications of classification methods in supply chain and logistics.
    Provide detail explanations. You need to explain why you think classification can be used in those
    cases, you do not need to provide data or solvethem.
  3. Assume one of the explanatory variables (named X1) in your logistic regression is a categorical
    variable with the following levels: low, average and high, and another explanatory variable (named
    X2) is also categorical with the following levels: Sydney, Melbourne and Brisbane. Explain how you
    will use them in developing your logistic regression model. How many coefficients you will have in
    your final model?
    (1.6+2.4+2.4 = 6.4 marks)
    Section B: Quantitative Questions
  4. There are 500 client records in the first worksheet of the Excel file (provided for this assessment)
    who have shopped many special products from an e-Business website. Each record includes data
    on types of product purchased (between 1-5), purchase amount ($), age, gender, family size of the
    customer, whether the client has a membership and whether the customer has a discount card.
    a) Explain the steps on how to develop a KNN model to predict which customers will spend morethan
    $1000. (Write your answer as: Step 1- … Step 2- … and so on. You do not need to run any software
    and report the results, for example for Step 1 you can write cleaning the data that means ….)
    b) Develop a regression model to predict the spend amount of a new female customer with age of 28
    who is living in a family with size 3 and is not a member and hold a discount card type.
    (3.2+3.2=6.4 marks)
  5. A company provides maintenance service for washing machines in Victoria. The collected data
    are presented in the Excel file (second worksheet).
    a) Assume the manager asked you to analyse the data and provide him some insights and
    recommendations. The report should not exceed 2 pages.
    b) Build a model to predict the repair time for a future booking service than needs to be done by John
    and it is an Electrical repair. Do you suggest this service to be assigned to the morning shift or
    afternoon shift?
    c) What other data you recommend to the manger to be added into this dataset in future for better
    analysis and what kind of analysis you think will be useful based onthem.
    (3.2+2.4+1.6 = 7.2 marks)
  6. In worksheet 3, a dataset from blood bank is presented. The data are recorded for apheresis blood
    donation made by a group of donors of a period of time. The donor ID is unique for each donor. A
    donor might have donated more than once in this period. At each donation, the blood total protein
    level of the donor has been recorded. Use the dataset to answer the following questions:
    a) There are some missing values for blood type. Think how you can fill in the missing values.
    Explain your approach (step by step) and also apply your approach and try to fill the missing
    value as much as possible in. (save the results in an Excel worksheet in and name it Question 3
    Part a.)
    b) Calculate the average of total protein for each blood type. Explain your approach (step by step).
    Report them in a worksheet and name it Question 3 Part b.
    c) Calculate the range of totalprotein for each blood type. Explain your approach (steps by step).
    Report them in a worksheet and name it Question 3 Part c.
    d) Is total protein declining by age?
    e) Present two best visualisation tool for this data that you think provide useful information?
    (2+1.2+2+1.2+1.6= 8 marks)
  7. The data presented in worksheet 4 is the results of a 4-year study conducted to assess how
    age, weight, and gender influence the risk of diabetes. Risk is interpreted as the probability
    (times 100) that the patient will have diabetes over the next 4-yearperiod.
    a) What predictive model you suggest to relate risk of diabetes to the person’s age, weight and the
    gender. Why?
    b) Develop an estimated multiple regression model that relates risk of diabetes to the person’s age,
    weight, gender and lifestyle. Present the regression formula as a mathematical equation. Interpret
    the coefficients of the regression and comment on the strength of theregression.
    c) What is the risk percentage of diabetes over the next 4 years for a 59-year-old man living in a small
    town with 72 kg weight?
    (3.2+2+2= 7.2 marks)
  8. Matthew has a new job as business analyst. He plans to invest 10 percent of his annual
    salary after the tax into a retirement account at the end of every year for the next 30 years.
    Suppose that annual return is 5%, and his current salary before tax is 85k which grow 3% per
    year. The tax will apply as 15% on the salary up to 50k and it is 20% for the salary interval of
    50k and 80k and the tax rate will be 25% for the remaining salary more than 80k (for example
    if his salary will be 105k, he is paying 15% tax on his first 50k and 20% in the next 30 k and
    25% on his next 25k of his salary). then:
    a) Create a spreadsheet which shows Matthew the balance of retirement account for various levels of
    annual investments and returns.
    b) If Matthew aims to gain $1,100,000 at the end of the 30th year, what percentage of his salary he
    should put in the investment annually.

Sample Solution

This question has been answered.

Get Answer