Develop logistic regression, decision tree and neural network models that will identify
whether stores will perform well or poorly. You can use Orange, Python, R, or any data mining
package of your choice. The data for the assignment is in a file storedata.csv, which you can
download from the same place you found this document. The data dictionary is given at the
end of this document. You must follow the correct methodology to use the data to build and
test your models.
Sample Solution
To begin, it is important to understand the data set provided. The ‘storedata.csv’ file contains information on various store locations such as region, location type and sales per square foot of retail space, among other things. This data will allow us to better understand which stores are performing well or poorly based on their respective attributes in order to create predictive models that can be used for future decisions regarding store openings or relocations.
We can use a variety of popular data mining packages such as Orange, Python, R etc., in order build our logistic regression models with this data set. First off we must start by cleaning & pre-processing the given dataset so that we may create an accurate representation of our target variable (store performance). This process includes dealing with any missing values or outliers present before manually transforming categorical/binary variables into numerical form where applicable.
Once cleaned & formatted correctly we can then proceed towards splitting up the dataset into training & testing subsets – using one part to train our model while reserving the remainder for validating its accuracy once finished. After dividing up the samples accordingly we may begin constructing the actual logistic regression model using relevant metrics like sales per square foot alongside various other input variables identified earlier during pre-processing stage (such as location type etc.).
After optimizing parameters within our LR model so as maximize predictive potential we must test accuracy of results generated against original testing subset so see how closely they match up – indicating whether any adjustments need made prior moving onto decision tree building portion project!
In conclusion then through proper analysis and modeling techniques it is possible utilize storedata file predict future success failure stores effectively utilizing appropriate software packages available at disposal!
Sample Solution
To begin, it is important to understand the data set provided. The ‘storedata.csv’ file contains information on various store locations such as region, location type and sales per square foot of retail space, among other things. This data will allow us to better understand which stores are performing well or poorly based on their respective attributes in order to create predictive models that can be used for future decisions regarding store openings or relocations.
We can use a variety of popular data mining packages such as Orange, Python, R etc., in order build our logistic regression models with this data set. First off we must start by cleaning & pre-processing the given dataset so that we may create an accurate representation of our target variable (store performance). This process includes dealing with any missing values or outliers present before manually transforming categorical/binary variables into numerical form where applicable.
Once cleaned & formatted correctly we can then proceed towards splitting up the dataset into training & testing subsets – using one part to train our model while reserving the remainder for validating its accuracy once finished. After dividing up the samples accordingly we may begin constructing the actual logistic regression model using relevant metrics like sales per square foot alongside various other input variables identified earlier during pre-processing stage (such as location type etc.).
After optimizing parameters within our LR model so as maximize predictive potential we must test accuracy of results generated against original testing subset so see how closely they match up – indicating whether any adjustments need made prior moving onto decision tree building portion project!
In conclusion then through proper analysis and modeling techniques it is possible utilize storedata file predict future success failure stores effectively utilizing appropriate software packages available at disposal!