Problem 1. Explain what each of the following R functions do? You can run them in R and check the
results.
(a) c(1, 17, −6, 3)
(b) seq(1, 5, by=0.5)
(c) seq(0, 10, length=5)
(d) rep(0, 5)
(e) rep(1:3, 4)
(f) rep(4:6, 1:3)
(g) sample(1:3)
(h) sample(1:5, size=3, replace=FALSE)
(i) sample(c(2,5,3), size=4, replace=TRUE)
(j) sample(1:2, size=10, prob=c(1,3), replace=TRUE)
(k) c(1, 2, 3) + c(4, 5, 6)
(l) max(1:10)
(m) min(1:10)
(n) range(1:10)
(o) matrix(1:12, nr=3, nc=4)
(q) Let a ← c(1,2,3), b ← c(10, 20, 30), c ←c(100, 200, 300), d ← c(1000, 2000, 3000). What does
the function rbind(a, b, c, d) do? What does cbind(a, b, c, d) do?
1
2 HOMEWORK 2 DUE DATE: FRIDAY, SEPTEMBER 25 AT 11:59 PM
(r) Let C be the following matrix
a b c d
1 10 100 1000
2 20 200 2000
3 30 300 3000
What is sum(C)? What is apply(C, 1, sum)? What is apply(C, 2, sum)?
(s) Let movies ← c(“SPYDERMAN”,“BATMAN”,“VERTIGO”,“CHINATOWN”). What does
lapply(movies, tolower) do? Notice that “tolower” changes the string value of a matrix to
lower case.
(t) Let x ← factor(c(“alpha”, “beta”, “gamma”, “alpha”, “beta”)). What does the function levels(x) return?
(u) c ← 35:50
(v) c(1, 2, 3) + c(4, 5, 6)
(w) c(1, 2, 3, 4) + c(10, 20)
(x) sqrt(c(100, 225, 400))
Problem 2. Create the following vectors in R.
a = (5, 10, 15, 20, …, 160)
b = (87, 86, 85, …, 56)
Use vector arithmetic to multiply these vectors and call the result d. Select subsets of d to identify the
following.
(a) What are the 19th, 20th, and 21st elements of d?
(b) What are all of the elements of d which are less than 2000?
(c) How many elements of d are greater than 6000?
Problem 3. This exercise relates to the College data set, which can be found in the file College.csv. It
contains a number of variables for 777 different universities and colleges in the US. The variables are
• Private : Public/private indicator
• Apps : Number of applications received
• Accept : Number of applicants accepted
• Enroll : Number of new students enrolled
• Top10perc : New students from top 10% of high school class
• Top25perc : New students from top 25% of high school class
• F.Undergrad : Number of full-time undergraduates
BUSINESS DATA MINING (IDS 472) 3
• P.Undergrad : Number of part-time undergraduates
• Outstate : Out-of-state tuition
• Room.Board : Room and board costs
• Books : Estimated book costs
• Personal : Estimated personal spending
• PhD : Percent of faculty with Ph.D.’s
• Terminal : Percent of faculty with terminal degree
• S.F.Ratio : Student/faculty ratio
• perc.alumni : Percent of alumni who donate
• Expend : Instructional expenditure per student
• Grad.Rate : Graduation rate
(a) Read the data into R. Call the loaded data “college”. Explain how you do this.
(b) How many variables are in this data set. What are their measurements? How do you get these
information?
(c) Use the function colnames() to change the “Top10perc” and “Top 25per” variables names to
“Top10” and “Top25”.
(d) Look at the data. You should notice that the first column is just the name of each university.
We don’t really want R to treat this as data. However, it may be handy to have these names
for later. Try the following commands:
rownames (college) → college [,1]
You should see that there is now a row.names column with the name of each university recorded.
This means that R has given each row a name corresponding to the appropriate university. R
will not try to perform calculations on the row names. However, we still need to eliminate the
first column in the data where the names are stored. Write a code to eliminate the first column.
(e) Add a column to indicate the acceptance rate for each university (acceptance rate = number of
accepted applications / number of applications received).
(f) Provide a summary statistics for numerical variables in the data set.
(g) Use the pairs() function to produce a scatterplot matrix of the first ten columns or variables of
the data. Recall that you can reference the first ten columns of a matrix A using A[,1:10]. Can
you observe any useful information in the plots?
(h) Use the boxplot() function to produce side-by-side boxplots of Outstate versus Private. Do you
observe any useful information in this plot?
(i) Create a new qualitative variable, called Elite, by binning the Top10perc variable. We are going
to divide universities into two groups based on whether or not the proportion of students coming
from the top 10% of their high school classes exceeds 50%. Follow the code below.
4 HOMEWORK 2 DUE DATE: FRIDAY, SEPTEMBER 25 AT 11:59 PM
Elite → rep (“No”,nrow(college))
Elite[college$Top10perc > 50] = “Yes”
Elite = as.factor(Elite)
college = data.frame(college,Elite)
i. Explain each line of the above code.
ii. Use the summary() function to see how many elite universities there are. Now use the
plot() function to produce side-by-side boxplots of Outstate versus Elite.
(j) Use the hist() function to produce some histograms with differing numbers of bins for a few of
the quantitative variables. You may find the command par(mfrow=c(2,2)) useful: it will divide
the print window into four regions so that four plots can be made simultaneously. Modifying
the arguments to this function will divide the screen in other ways.
(k) What is room and board costs of private schools on average ?
(l) Create a new binary variable that is 1 if the student/faculty ratio is greater than 0.5 and 0
otherwise.
(m) Compare the distribution of out of state tuition for private and public colleges.
Problem 4. This exercise involves the “Auto” data set.
(a) Remove the missing values from this data set.
(b) What is the range of each quantitative predictor? You can answer this using the range() function.
(c) What is the mean and standard deviation of each quantitative predictor?
(d) Remove the 10th through 85th observations. What is the range, mean, and standard deviation
of each predictor in the subset of the data that remains?
(e) Using the full data set, investigate the predictors graphically, using scatterplots or other tools of
your choice. Create some plots highlighting the relationships among the predictors. Comment
on your findings.
(f) Suppose that we wish to predict gas mileage (mpg) on the basis of the other variables. Do your
plots suggest that any of the other variables might be useful in predicting mpg? Justify your
answer.
Problem 5. FiveThirtyEight, a data journalism site devoted to politics, sports, science, economics,
and culture, recently published a series of articles on gun deaths in America. Gun violence in the
United States is a significant political issue, and while reducing gun deaths is a noble goal, we must first
understand the causes and patterns in gun violence in order to craft appropriate policies. As part of the
project, FiveThirtyEight collected data from the Centers for Disease Control and Prevention, as well as
BUSINESS DATA MINING (IDS 472) 5
other governmental agencies and non-profits, on all gun deaths in the United States from 2012-2014.You
can find this dataset, called ”gun deaths.csv”, on blackboard.
(a) Generate a data frame that summarizes the number of gun deaths per month.
(b) Generate a bar chart with labels on the x-axis. That is, each month should be labeled “Jan”,
“Feb”, “Mar” and etc.
(c) Generate a bar chart that identifies the number of gun deaths associated with each type of intent
cause of death. The bars should be sorted from highest to lowest values.
(d) Generate a boxplot visualizing the age of gun death victims, by sex. Print the average age of
female gun death victims.
Answer the following questions. Generate appropriate figures/tables to support your conclusions.
(e) How many white males with at least a high school education were killed by guns in 2012?
(f) Which season of the year has the most gun deaths? Assume that
– Winter = January – March
– Spring = April – June
– Summer = July – September
– Fall = October – December
– Hint: You need to convert a continuous variable into a categorical variable.
(g) Are whites who are killed by guns more likely to die because of suicide or homicide? How does
this compare to blacks and Hispanics?
(h) Are police-involved gun deaths significantly different from other gun deaths? Assess the relationship between police involvement and other variables.
Sample Solution
Text review of this article: This page of the paper has 2148 words. Download the full form above. Conceptual: In current days, exchanges can differ from a colossal exchange to microtransaction. Be that as it may, Bitcoin and other digital currency doesn't permit a microtransaction with no exchange expense. Another cyptocurrency called IOTA has no exchange charges which implies IOTA can be utilized for micropayments. We can send IOTA to a location without any expenses charged. Fundamentally, rather than a more modest subset of the organization being answerable for the general agreement (excavators/stakers), the whole organization of dynamic members are straightforwardly engaged with the endorsement of exchanges. In that capacity, agreement in IOTA is not, at this point decoupled from the exchange making measure: it's a natural piece of it, and it's what empowers IOTA to scale with no exchange charges. A full hub climate is arrangement and added this hub to neighbors which are appended to the knot. Played out a microtransaction from full hub to the next hub and tended to not many concerns which existing framework is having like exchange speed and versatility. This eliminates the current issue of exchange charge and simultaneously gives different advantages like versatility and high exchange speed. Presentation: What is blockchain? A blockchain is comprised of two essential parts: a decentralized organization encouraging and checking exchanges, and the unchanging record that organization keeps up. The inquiry is how would we perform microtransaction in this record. The response to the above inquiry is that with blockchain, we can envision a world wherein contracts are inserted in computerized code and put away in straightforward, shared information bases, where they are shielded from erasure, altering, and modification. Yet, utilizing bitcoin, the exchange charge is more noteworthy than the ordinary expense utilizing the bitcoin and how quick and versatile is the arrangement. The Problem with Blockchain The new element can absolutely change the versatile business and some connected paid substance, as applications or games. This field manages miniature exchanges and its costly expense structures. Nonetheless, "Bitcoin can possibly help this issue by commonly bringing down charges. However, things get truly amazing with off blockchain exchange, since it in a real sense brings the charges down to zero". Yet, this would truly not be a decent choice to make a microtransaction as the exchange charge is extremely high. Versatility • Bitcoin's blockchain takes around 10 minutes to affirm an exchange. About 200K unsubstantiated exchanges all at once. • This is irritating. As this arrangements with higher exchange charge and manage exchange vulnerability. It's difficult to scale. As we would prefer not to store the information straightforwardly in square chain as there is thing called block chain swelling so we need to have a pointer which is highlighting an information living in a dispersed hash table. Charges • Average exchange charge for exchange is in bitcoin network is $1. It probably won't be a lot for high exchanges yet for the exchanges managing microtransaction, it implies a great deal! • Solutions incorporate expanding block size limit, lightning organization, focal workers for off chain exchange, sidechains and treechains. Bunches of Computing Power required • Mining is to some degree incorporated in blockchain by goliath mining pools. • Transaction speed decays as the organization increments in size as more exchanges go after the restricted square spaces. • It will take increasingly figuring capacity to mine a similar measure of bitcoin. Powerless against Quantum assault • Bitcoin and other verification of work based blockchains are helpless to being broken by an incredible quantum PC. There is another innovation which has developed called "Particle". Particle ARCHITECTURE Particle USES BLOCKDAG • IOTA isn't an abbreviation for Internet of Things, IOTA simply mean something tiny. Square chains are successive chains where squares are included ordinary spans. • The knot a DAG (Directed Acyclic Graph) can accomplish high exchange throughput and no exchange charges on exchanges. • As it develops and more members make exchanges, the general framework turns out to be safer and quicker, with affirmation times/exchange absolution going down. • But look it's actually utilizes conveyed information base, it's as yet a P2P organization it actually depends on an agreement and approval system. • The more the hubs, the quicker the exchange speed will be. Green Blocks: Transactions on which agreement was accomplished. Red Blocks: Transactions where we are as yet unsure on their full acknowledgment. Dim Blocks: Tips (Unconfirmed exchanges). There is no exchange charge as there is no mining included. At the point when we make an exchange we need to affirm two different exchanges utilizing the evidence of work. Tallness • Height is the length of the longest arranged way to the beginning. • For instance: G has a tallness of I. D has a tallness of 3 Profundity • Depth is the longest opposite situated way to some tip. • For instance: g has a profundity of 4 to TIP A. Way = F,D,B and A. It's a 3 stage cycle to make an exchange 1) Signing – You sign the exchange contributions with your private keys. 2) Tip choice – Markov chain Monte Carlo is utilized to haphazardly choose two hints (i.e unverified exchanges), which will be referred to by your exchange. 3) Proof of Work: In request to have our exchanges acknowledged by the organization, we need to do some evidence of work-Similar to hashcash. Your hub checks if the two exchanges are not clashing. Next, the hub must do some verification of work by fathoming a cryptographic riddle (hashcash). Hashcash works by consistently hashing a similar information with a minuscule variety until a hash is found with a specific number of driving zero pieces. This PoW is to forestall spam and Sybil assaults. A Sybil assault depends on the supposition, that a big part of all hash power is coming from pernicious hubs. Whenever you've done that, your exchange will be transmission to the organization. Another person will tag along, pick your exchange in the tip choice measure and approve it. Also, much the same as that, exchange is affirmed. • It's very simple to decide the affirmation level of your exchange: it executes the MCMC calculation N times, the likelihood of exchange being acknowledged is thusly M of N. • As a trader, in IOTA you have total opportunity to choose with what likelihood you will begin tolerating exchanges. On the off chance that you are content with 51% exchange you can build the edge to 99 or 100. How does IOTA forestall twofold spending • In the knot, exchanges are nuclear. During handling however, groups are. • When a full hub is approached to give tips to a light hub to make an exchange, the full hub will walk in reverse along the edges of the DAG to the overall exchange and check if there are any clashing exchanges en route. On the off chance that there is, at that point that tip is disposed of. In the event that there isn't, at that point the tip is viewed as substantial. • An aggressor would need to dominate the info stream of new exchanges. The knot is network bound – requiring an assailant to be wherever simultaneously. • So full hubs are continually being approached to give branch and trunk tips to the light hubs for packaging purposes and will just choose tips liberated from struggle. The aggressor will attempt to do likewise with the twofold spend and needs to figure out how to overpower the whole organization's convergence. • There is no worldwide consistency in the knot. There is just possible consistency. • Stuck exchanges are called vagrants. They can be assembled in subtangles. To make it feasible for the organization to develop and secure it against specific assaults, IOTA at present depends on an organizer. The facilitator checkpoints substantial exchanges, which are then approved by the whole organization. The facilitator is being controlled by the IOTA establishment in a multi-signature. The organizer can't denounce any and all authority as he is being checked and approved by the whole organization. Particle FEATURES • Infinite Scalability • No charge miniature exchanges • Quantum-obstruction • Making micropayments in the Bitcoin network has neither rhyme nor reason if the charges are higher than the exchange esteem. Arrangement: 1) To set up the Full hub we need to introduce the java by heading off to this connection: http://www.oracle.com/technetwork/java/javase/downloads/index.html 2) We introduce Java SE 8u171 since it is viewed as steady form. As per the Operating System introduce the ideal rendition. I have utilized Windows x64. 3) Go to https://github.com/iotaledger/wallet/deliveries to introduce the wallet and it goes about as front end for the hub. 4) 5) Go to https://github.com/iotaledger/wallet/delivers and download the above demonstrated container record. 6) Once, the wallet is being introduced it will have the screens as demonstrated as follows Notice the port number you find in the wallet after establishment. Whenever this is done make reference to the name u longing and make a standard which will permit the wallet port to experience the firewalls and communicate with the IOTA organization. • Once this is refined, it will toss a mistake which needs the iri record to be stuck which was downloaded in sync 5 in the where the particle wallet organizer is made. For example here, my particle organizer is made in C envelope under AppData\Local|Programs\Iota. Go to assets under particle envelope and make another organizer called iri and glue the one we downloaded in sync 5. Simply after this stage a full hub is introduced. • If the above advance is effective, at that point this is the way the wallet GUI looks. We can add neighbors who assumes an extremely fundamental function in playing out the microtransaction. The full hub needs to have neighbors and they endorse the other 2 exchanges utilizing the agreement and verification of work and afterward they can make their own exchange. The beneath screen capture shows how the neighbors>
GET ANSWER