Machine Learning with Python

Perform the below Tasks
1.Download the titanic data set given below in the link, passenger survival data for the Titanic
https://www.kaggle.com/hesh97/titanicdataset-traincsv/download

  1. Perform Random forest, Decision trees and Gradient boost classification on the above dataset
  2. To calculate this, generate a random 80/20 split (using dataset. Split (0.8)) train the model
    on the 80% fraction and then evaluate the accuracy on the 20% fraction.
  3. What is the accuracy of your decision tree classifier on the Titanic data set with unlimited
    depth. (Repeat this 100 times and average the result (hint: do the repetition in code :).
  4. What is the best depth limit to use for this data? To answer this, do the same calculations
    as above (average 100 experiments), but do it for increasing depth limits, specifically 0, 1, 2,
    …, 10. Show all of your results.
  5. Do we see overfitting with this data set? Repeat the experiment from question 3 within-
    creasing depth (0, 1, …, 10) and calculate the accuracy this time on both the testing data
    (like before) and training data.
  6. What is the accuracy of the random forest and gradient boost classifier on the Titanic data set.
  7. Create a graph with matplotlib library with these results and then provide a 1-2 sentence answer describing the graph.
  8. preprocess the data using Standard Scalar or MinMax Scalar for Random Forest and Gradient Boost.
  9. Evaluate the model and plot the confusion matrices. Compare the Performance without preprocessing and with preprocessing and Tabulate your results and provide a justification which approach is better.
  10. Prepare a neat report discussing all the above tasks

Sample Solution

ACED ESSAYS