Decision Tree

Do the following assignment and submit documentation for each step in a Word document
1) Install the R statistical package on your computer. To do this go here: Pick the download that’s right for you (Windows, Mac or Linux), and get the base executable file. Then go ahead and run it to install the R engine.
2) Now, get R Studio. To do this go here: Pick the download that’s right for you (Windows, Mac or Linux), and get the executable. Go ahead and run it too. 3) Start R Studio. This will be done from your application menu or a desktop icon if you chose to add one when you installed R Studio. 4) Connect to the Loan Decisions.csv file in R: LoanDecisions <- read.csv(file.choose(), header=T) 5) Attach the LoanDecisions data frame in R by issuing: attach(LoanDecisions) 6) Start the rpart function by issuing: library(rpart) If you don’t have rpart, use the R Studio menu to get it by selecting Tools > Install Packages… and then find rpart and install it. Then issue: library(rpart) 7) Build your model using the rpart function in R. Issue the following formula: LoanTree <- rpart (LoanOutcome Number0fMissedOrLatePayments + LinesOfCredit + CreditScore + Monthlylncome + AgelnYears + MaritalStatus, method=”class”) 8) Examine the properties of the decision tree model you have just created, issue: summary(LoanTree)
9) In your Word document explain which independent variables are the best predictors of LoanOutcome, and how you know. Explain the worst predictor of LoanOutcome as well. Use specific output from the summary in your explanation. Note: Marital Status 1 is ‘married’; 2 is ‘single, never married’; 3 is ‘divorced or widowed’.
10) Connect to the Loan Applicants.csv file in R: LoanApplicants <- read.csv(file.choose(), header=T)
11) Apply the model you’ve built to generate predictions using the decision tree. Issue the following: MyPredictions <- predict(LoanTree, LoanApplicants)

12) Issue this command: MyPredictions What you see is the percentage of confidence for each possible category. The highest percentage is the prediction for that line. So for example, if there’s a 1.000 under Do Not Lend, then there’s a 100% confidence in predicting that the person on the line should not get a loan. But if there’s a .785 under Do Not Lend, and .215 under Manager’s Discretion-Risk Terms, then you probably won’t let to them, but there’s a 21.5% chance that a manager could decide to make the loan, perhaps with a high interest rate or a more aggressive repayment schedule.
13) Combine predictions with applicant data into a single data frame by issuing: LoanPredictions <- data.frame(MyPredictions, LoanApplicants)
14) Export predictions to a CSV file by issuing: write.csv(LoanPredictions, “c:MsersnDesktopTreePredictions.csv”)
The c:11 path above needs to be a valid path on your computer. must be replaced. You don’t have to put it on your desktop, but put it somewhere you can get it. In your Word document, explain the loan decisions you have predicted for the applicants in the Loan Applicants.csv file. Submit your Word document, and your TreePredictions.csv file.
15) Review the predicted results that are in your LoanPredictions file. Summarize your predictions, explaining how many loan applicants you project to fall into each category. Evaluate your projections in the context of the independent variables. Will some peoples’ loan applications likely be denied? If so, why? What variables play the greatest role in people getting their loans, and do any variables seem to have a higher positive impact on the terms of the loans? Write one to three paragraphs about how a real lending institution could use the data analysis you’ve done in this exercise to make better, risk-managed lending decisions.




Sample Solution