Use R to mine actual data for a problem of interest. These could be data from a problem from your current job if you have one, something of interest to the School of Management or College, data acquired from the web, etc. (there are suggestions as to places where you can find relevant data on the electronic reading list for this course). You will design the data mining task, mine the data, and describe your results. You also will research existing solutions to the problem, if any have been proposed or documented. Your own data and results need not be on a par with actual industry results; the goal is for you to get as realistic a hands-on experience as possible, given the constraints of what you have learned.
In writing up/presenting your research, think of yourself as an analyst employed by or retained by a company (large or small) or by a funding source (e.g., a venture capital (VC) firm or incubator), who wants to understand the state of the art for using data mining for the task in question. Review what has been done to date on your problem. Consider as an example predictive analytics for on-line advertising: A VC firm considering funding on-line ad networks or ad-tech startups would need to understand the state of the art in using data mining for targeting on-line advertising, when considering an idea for applying data mining. Don’t worry too much about coming up with a novel idea. It is more important to develop the idea well (within the scope of what we’ve discussed in class).
You should use the CRISP-DM data mining process to structure your research and report. Keep in mind that it may be ineffective simply to proceed linearly through the steps, and this may need to be reflected in your analysis. You should interact with me from the preparation of your initial ideas through to the preparation of your report, as a consultant would interact with a firm or funding source in preparing a research report. Use your imagination, prior experience, or ask for help to fill in any gaps between the material available and what you would be able to find out if you actually could interact with the client firm.
This assignment will have a phased submission of work, as follows:
Submission 1: On Wednesday 28th February 2018, you will submit a proposal for your project via Moodle. This should give as much detail as possible on your ideas, so that I can give you brief feedback. Include in your proposal your ideas about: What is the exact business problem? What is the use scenario? What precisely is the data mining problem? Is it supervised or unsupervised? What data will you be using and where will you obtain it? What is a data instance? What might be the target variable? What features would be useful? How exactly would it add business value? And so on. Please include a link to the data set you will be using.
Submission 2: On Wednesday 21st March 2018you will submit your final report which should be about 1500 words, plus any appendices you would like to include. Use external sources where appropriate, and provide clear citations and bibliography. You should also submit your data file and a working R script which I can run against it.
You will get the most out of the project if you interact with me during the development of your ideas. Please feel free to come talk to me about your ideas as often as you’d like — my office hours are on Mondays and Tuesdays 12.00-13.00 and you should email me for an appointment to make sure that I have a free time. Or email me with specific questions/problems you are having.
Your report should include the information detailed below, in approximately the order given. Your report need not have corresponding sections or bullet points, but I should be able to find the information without searching too hard. Be as precise/specific as you can.
Business Understanding (take this seriously)
• Identify, define, and motivate the business problem that you are addressing.
• How (precisely) will a data mining solution address the business problem?
(NB: I’d like to see a good definition/motivation of the business problem and a precise statement of how a data mining solution will address the problem. It’s not so important that the hands-on results match perfectly. It’s more important that you have the experience of working through a realistic problem definition.)

Data Understanding
• Identify and describe the data (and data sources) that will support data mining to address the business problem. Include those aspects of the data that we talk about in class and/or in the quizzes.

Data Preparation
• Specify how these data are integrated to produce the format required for data mining.
(NB: data preparation can be time consuming. Get started early. Talk to me if you need advice.)

Modelling
• Specify the type of model(s) built and/or patterns mined.
• Discuss choices for data mining algorithm: what are alternatives, and what are the pros and cons?
• Discuss why and how this model should “solve” the business problem (i.e., improve along some dimension of interest to the firm).

Evaluation
• Discuss how the result of the data mining is/should be evaluated. How should a business case be developed to project expected improvement? ROI? If this is impossible/very difficult, explain why and identify any viable alternatives.

Deployment
• Discuss how the result of the data mining will be deployed.
• Discuss any issues the firm should be aware of regarding deployment.
• Are there important ethical considerations?
• Identify the risks associated with your proposed plan and how you would mitigate them.

MARKING CRITERIA
The submitted and assessed part of this coursework is a report together with R code and data files, rather than an academic essay. Thus, the marking criteria are different from those usually required for an academic essay. Your assignment will be assessed on the criteria shown in the rubric on the next page:
• The percentage given in the leftmost cell of each row shows you the percentage of the final mark available for that criterion
• The Max % shown in the topmost cell of each column shows you the final mark you would achieve if you were awarded marks in this column for all criteria
• The number in each box shows the score you will be given for that individual criterion at the level you have achieved
• Your feedback will include a mark for each criterion enabling you to see exactly where you gained/lost marks

MARKING RUBRIC
FIRST CLASS UPPER SECOND LOWER SECOND THIRD CLASS FAIL
82-100% 72-78% 62-68% 52-58% 42-48% 35% 25% 15%
Business Understanding (10%) Outstanding definition of the business problem with precise and detailed statement of how data mining solution will address it
Excellent definition of the business problem with precise and detailed statement of how data mining solution will address it
Very good definition of the business problem with precise or detailed statement of how data mining solution will address it
Good definition of the business problem with statement of how data mining solution will address it

Some attempt at definition of the business problem with imprecise statement of how data mining solution will address it
Poor definition of the business problem with little consideration of how data mining solution will address it

Little definition of the business problem with little consideration of how data mining solution will address it

No definition of the business problem

Data Understanding (10% Outstanding identification and description of data and data sources
Excellent identification and description of data and data sources
Very good identification and description of data and data sources
Good identification and description of data and data sources
Some attempt at identification and description of data and data sources Poor identification and description of data and data sources
Little identification and description of data and data sources
No identification or description of data and data sources

Data Preparation (10%) Outstanding specification of how data are prepared for data mining
Excellent specification of how data are prepared for data mining
Very good specification of how data are prepared for data mining
Good specification of how data are prepared for data mining
Some attempt at specification of how data are prepared for data mining Poor specification of how data are prepared for data mining
Little specification of how data are prepared for data mining
No specification of how data are prepared for data mining

Modelling (20%) Outstanding choice of one or more modelling
/prediction techniques Excellent choice of one or modelling
/prediction techniques Very good choice of one or modelling
/prediction techniques Good choice of one or more modelling
/prediction techniques Good choice of single modelling technique Poor choice of single modelling technique Little choice of single modelling technique No modelling techniques described

Outstanding discussion of alternatives and applicability of model(s) Excellent discussion of alternatives and applicability of model(s) Very good discussion of alternatives and applicability of model(s) Good discussion of alternatives and applicability of model(s) Some attempt at discussion of alternatives and applicability of model(s) Poor discussion of alternatives and applicability of model(s) Little discussion of alternatives and applicability of model(s) No consideration of alternatives and applicability of model(s)
Evaluation (5%) Outstanding discussion of aspects of evaluation Excellent discussion of aspects of evaluation Very good discussion of aspects of evaluation Good discussion of aspects of evaluation Some attempt at discussion of aspects of evaluation Poor discussion of aspects of evaluation Little discussion of aspects of evaluation No discussion of aspects of evaluation
Deployment (5%) Outstanding discussion of deployment issues Excellent discussion of deployment issues Very good discussion of deployment issues Good discussion of deployment issues Some attempt at discussion of deployment issues Poor discussion of deployment issues
Little discussion of deployment issues
No discussion of deployment issues

R Code (25%) Submitted code and data contains functionality which goes beyond what has been taught Submitted code and data function efficiently and without changes Submitted code and data function without changes Submitted code and data function without significant changes Submitted code and/or data require some changes before they will function Submitted code and/or data require substantial changes before they will function Little code submitted and/or code and data require substantial changes before they will function No code submitted

Code contains clearly worded comments throughout Code contains clearly worded comments throughout Code contains comments throughout Code contains comments throughout Code contains some comments Code contains few comments Code contains very few comments Code contains no comments
References (5%) Outstanding use of citations within the text with all references accurately cited Excellent use of citations within text with nearly all references accurately cited Very good use of citations within text with most references properly cited Good use of citations within text with most references properly cited Reasonable use of citations within text but not necessarily cited properly Very few citations within text and not accurately cited Very few if any citations within text and not accurately cited No evidence of understanding of referencing systems
Use of English and Writing Style (5%) Outstanding standard with no errors, clear and exceptionally easy to read Excellent standard, with negligible errors, very clear and easy to read Very good standard with only minor errors, clear writing style and generally easy to read Good standard with only minor errors, clear writing style and generally easy to read Some errors (punctuation, misuse of words, spelling, sentence construction) make the work difficult to understand Frequent errors (punctuation, misuse of words, spelling, sentence construction) make the work very difficult to understand Very frequent errors (punctuation, misuse of words, spelling, sentence construction) make the work exceedingly difficult to understand Very frequent errors (punctuation, misuse of words, spelling, sentence construction) make the work largely incomprehensible
Overall Presentation (5%) Outstanding organisation and presentation Excellent organisation and presentation Very good organisation and presentation Good organisation and presentation Organisation and presentation generally satisfactory Organisation and presentation poor
Organisation and presentation very poor
Unacceptable organisation and presentation

Sample Solution

Sample solution

Dante Alighieri played a critical role in the literature world through his poem Divine Comedy that was written in the 14th century. The poem contains Inferno, Purgatorio, and Paradiso. The Inferno is a description of the nine circles of torment that are found on the earth. It depicts the realms of the people that have gone against the spiritual values and who, instead, have chosen bestial appetite, violence, or fraud and malice. The nine circles of hell are limbo, lust, gluttony, greed and wrath. Others are heresy, violence, fraud, and treachery. The purpose of this paper is to examine the Dante’s Inferno in the perspective of its portrayal of God’s image and the justification of hell. 

In this epic poem, God is portrayed as a super being guilty of multiple weaknesses including being egotistic, unjust, and hypocritical. Dante, in this poem, depicts God as being more human than divine by challenging God’s omnipotence. Additionally, the manner in which Dante describes Hell is in full contradiction to the morals of God as written in the Bible. When god arranges Hell to flatter Himself, He commits egotism, a sin that is common among human beings (Cheney, 2016). The weakness is depicted in Limbo and on the Gate of Hell where, for instance, God sends those who do not worship Him to Hell. This implies that failure to worship Him is a sin.

God is also depicted as lacking justice in His actions thus removing the godly image. The injustice is portrayed by the manner in which the sodomites and opportunists are treated. The opportunists are subjected to banner chasing in their lives after death followed by being stung by insects and maggots. They are known to having done neither good nor bad during their lifetimes and, therefore, justice could have demanded that they be granted a neutral punishment having lived a neutral life. The sodomites are also punished unfairly by God when Brunetto Lattini is condemned to hell despite being a good leader (Babor, T. F., McGovern, T., & Robaina, K. (2017). While he commited sodomy, God chooses to ignore all the other good deeds that Brunetto did.

Finally, God is also portrayed as being hypocritical in His actions, a sin that further diminishes His godliness and makes Him more human. A case in point is when God condemns the sin of egotism and goes ahead to commit it repeatedly. Proverbs 29:23 states that “arrogance will bring your downfall, but if you are humble, you will be respected.” When Slattery condemns Dante’s human state as being weak, doubtful, and limited, he is proving God’s hypocrisy because He is also human (Verdicchio, 2015). The actions of God in Hell as portrayed by Dante are inconsistent with the Biblical literature. Both Dante and God are prone to making mistakes, something common among human beings thus making God more human.

To wrap it up, Dante portrays God is more human since He commits the same sins that humans commit: egotism, hypocrisy, and injustice. Hell is justified as being a destination for victims of the mistakes committed by God. The Hell is presented as being a totally different place as compared to what is written about it in the Bible. As a result, reading through the text gives an image of God who is prone to the very mistakes common to humans thus ripping Him off His lofty status of divine and, instead, making Him a mere human. Whether or not Dante did it intentionally is subject to debate but one thing is clear in the poem: the misconstrued notion of God is revealed to future generations.

 

References

Babor, T. F., McGovern, T., & Robaina, K. (2017). Dante’s inferno: Seven deadly sins in scientific publishing and how to avoid them. Addiction Science: A Guide for the Perplexed, 267.

Cheney, L. D. G. (2016). Illustrations for Dante’s Inferno: A Comparative Study of Sandro Botticelli, Giovanni Stradano, and Federico Zuccaro. Cultural and Religious Studies4(8), 487.

Verdicchio, M. (2015). Irony and Desire in Dante’s” Inferno” 27. Italica, 285-297.