Develop logistic regression, decision tree and neural network models that will identify
whether stores will perform well or poorly. You can use Orange, Python, R, or any data mining
package of your choice. The data for the assignment is in a file storedata.csv, which you can
download from the same place you found this document. The data dictionary is given at the
end of this document. You must follow the correct methodology to use the data to build and
test your models.
Sample Solution
For this assignment, I will be using Python to develop logistic regression, decision tree and neural network models that will identify whether stores will perform well or poorly. The dataset used for the project can be found in the storedata.csv file which contains a data dictionary at the end of the document. To begin we must first import all necessary libraries including pandas, numpy, matplotlib and sklearn.
Sample Solution
For this assignment, I will be using Python to develop logistic regression, decision tree and neural network models that will identify whether stores will perform well or poorly. The dataset used for the project can be found in the storedata.csv file which contains a data dictionary at the end of the document. To begin we must first import all necessary libraries including pandas, numpy, matplotlib and sklearn.
The next step is to load our csv file into a pandas dataframe using read_csv and assign it to an object called store_df. We can then use the describe() method to check our database’s contents by executing print(store_df). From here we need to separate our independent variables (X) from dependent variable (Y), which in this case is “Performed Well” column as Y and rest of them as X.
After splitting out target from input datasets, we need preprocess or normalize/standardize values if needed by using StandardScaler(). We can now initiate Logistic Regression classifier model with LogisticRegression(), Decision Tree Classifier model with DecisionTreeClassifier()and Neural Network Classifier Model with MLPClassifier().
Once models are initiated,we need fit each one of them separately on train set and predict for test set for accuracy evaluation by applying accuracy score() method after calling prediction on each model respectively i.e., lrPrediction = lrModel.predict(XTest), dtPrediction=dtmodel.predict(XTest) ,mlpPrediction=mlpmodel.predict(XTest) .
We also have option to compare different models performance based on their respective accuracy scores with statement like
if lrScore > dtscore & mlpscore:
print(“Logistic Regrssion has Higher Accuracy”)
Finally we can generate visualization plots for further analysis of individual models results using library matplotlib such as confusion matrix plot etc.. With these steps completed successfully we should now be able to identify whether stores perform well or not accurately by building successful logistic regression, decision tree and neural network models.
What other methods could you use besides logistic regression, decision trees and neural networks?