Evaluating Predictive Performance

A large number of insurance records are to be examined to develop a model for predicting fraudulent claims. Of the claims in the historical database, 1% were

judged to be fraudulent. A sample is taken to develop a model, and oversampling is used to provide a balanced sample in light of the very low response rate.

When applied to this sample (n = 800), the model ends up correctly classifying 310 frauds, and 270 non frauds. It missed 90 frauds, and classified 130 records

incorrectly as frauds when they were not.

a. Produce the confusion matrix for the sample as it stands.

b. Find the adjusted misclassification rate (adjusting for the oversampling).

c. What percentage of new records would you expect to be classified as fraudulent?

Sample Solution