Drawbacks involved in observational science.

9: Many sciences rely on observation instead of (or in addition to) designed experiments. Compare the data quality issues involved in observational science with those of experimental science and data mining.

For the answer:

A) List 2 drawbacks involved in observational science. Explain each drawback with 1 sentence.

B) List 2 drawbacks involved in experimental science. Explain each drawback with 1 sentence.

C) Explain in less than 4 sentences why would you prefer one approach over the other (observational and experimental approach).

10: Discuss the difference between the precision of a measurement and the terms single and double precision, as they are used in computer science, typically to represent floating-point numbers that require 32 and 64 bits, respectively.

Some material on single and double: https://www.mathworks.com/help/matlab/matlab_prog/floating-point-numbers.html

For the answer:

A) Define single precision in terms of bits in a memory.

1) What each bit represents

2) What is the decimal maximum value a single precision can represent?

B) How much more precise is a double precision format?

C) Writ one sentence on why you would use double over single?

  1. This exercise compares and contrasts some similarity and distance measures.

For the answer:

A) For binary data, the L1 distance corresponds to the Hamming distance; that is, the number of bits that are different between two binary vectors. The Jaccard similarity is a measure of the similarity between two binary vectors.


x = 0101010001
y = 0100011000

1) Compute the Hamming distance between x and y.

2) Compute the Jaccard similarity between x and y.

B) Explain in one sentence, which approach, Jaccard or Hamming distance, is more similar to the Simple Matching Coefficient.

C) Explain in one sentence, which approach is more similar to the cosine measure?

D) If you are comparing how similar two organisms of different species are in terms of the number of genes they share.

1) Explain, which measure, Hamming or Jaccard, would be more appropriate for comparing the genetic makeup of two organisms. (Assume that each animal is represented as a binary vector, where each attribute is 1 if a particular gene is present in the organism and 0 otherwise.)

2) If you wanted to compare the genetic makeup of two organisms of the same species, e.g., two human beings, explain in one sentence. which you would use the Hamming distance, the Jaccard coefficient, or a different measure.

Sample Solution