Machine Learning

Optimize the portfolio of (experimental) varieties to be grown at the target farm. Information about the target farm is available in the evaluation dataset. The optimal portfolio can have at most 5 varieties of soybean. It is not necessary but you are welcome to use the methods you learn in prescriptive analytics class to construct the optimal portfolio. If you are not familiar with optimization, come up with a meaningful heuristics to construct the portfolio.
You are encouraged to divide the project work into three components: Descriptive Analytics, Predictive Analytics, and Prescriptive Analytics.
I. Descriptive Analytics
Perform an exploratory data analytics to unearth patterns in the given data to educate yourself about the given data. For example,

  1. Plot the latitudes and longitudes on a map to visualize the locations of farms. Identify where the target/evaluation farm is located. It should be noted that most of the farms are located in the Midwest of the US.
  2. Generate frequency distribution for varieties. Decide if you have enough data for each variety to build dedicated prediction models for every variety.
  3. Check to see if there is any relationship between the locations and varieties. Explore if certain varieties are grown more often in some regions than in other regions.
  4. Look for patterns in weather variables. Explore relationships between locations and weather related variables.
  5. Plot the distribution of the yield variables. Based on the plot, what do you think a realistic goal for the optimal portfolio at the target farm?

II. Predictive Analytics
Decide a target variable to help you with the project goal. Variety_Yield and Yield_Difference are good candidates for the target variable. Based on the frequency distribution generated in the descriptive analytics, decide which varieties will have its own prediction model. Also, decide which varieties are going to be combined in the same model. Have an identifier for varieties in the combined model so that predictions can be made for individual varieties. Generate models using the following algorithms (if your target variable is continuous):

  1. Linear Regression
  2. LASSO
  3. Regression Tree
  4. Bagging
  5. Random Forest
  6. Boosted Trees
  7. Neural Network
    Generate models using the following algorithms (if your target variable is categorical):
  8. Logistic Regression
  9. Classification Tree
  10. Bagging
  11. Random Forest
  12. Boosted Trees
  13. Neural Network
  14. Support Vector Machine
    Using these models, predict the yield or yield difference for every potential variety at the target/evaluation farm. Depending upon the choice of your target variable, these predictions need not be yield or yield difference. Make predictions for multiple weather related uncertainties. Ensure that chosen weather related scenarios are suitable for the location of the target / evaluation farm.

Sample Solution