How to understand ROC curve

During a layover on a Chrismas vacation to Alaska, I asked my friend who majors in statistic what an ROC curve is--a concept that had confused me for a long time. Although my friend gave me a detailed explanation, the only conclusion I remember is that the farther the curve is above from the y=x line, the better the ROC curve is. But how? Back home from vacation, I decided to figure it out. 

The x & y axis for ROC curve are False positive rate (FPR) and True positive rate (TPR) respectively. What are they? let me use a simply example from wikipedia to explain these concepts:

 

"imagine a study evaluating a test that screens people for a disease. Each  person taking the test either has or does not have the disease. The test outcome can be positive (classifying the person as having the disease) or negative (classifying the person as not having the disease). The test results for each subject may or may not match the subject's actual status. In that setting:

  • True positive: Sick people correctly identified as sick
  • False positive: Healthy people incorrectly identified as sick
  • True negative: Healthy people correctly identified as healthy
  • False negative: Sick people incorrectly identified as healthy

after getting the numbers of true positives, false positives, true negatives, and false negatives, the sensitivity and specificity for the test can be calculated. If it turns out the sensitivity is high then any person who has the diseases is likely to be classified as positive by the test. On the other hand, if the specificity is high, any person who does not have the disease is likely to be classified as negative by the test. " 

 They also give two straightforward pictures to illustrate concepts of sensitivity and specificity:


As for calculation of sensitivity, specificity, and FPR, we have:

sensitivity (TPR) = TP / (TP + FN)

specificity = TN / (TN + FP)

FPR = FP / (TN + FP) = 1 - specificity


Apparently, we hope TPR is as high as possible, and this is why the ROC curve should be farther above away y = x line.


 

 

Impossible triangle in life

In MRI, there is an "impossible triangle" formed by SNR, spatial resolution, and scan time. Improving any two inevitably comes at the expense of the third. High SNR and high resolution demand long scan times; short scans with high resolution suffer from low SNR; and achieving both high SNR and short scan times requires sacrificing spatial resolution.

I recently realized that life seems to follow a similar "impossible triangle" defined by money, time, and energy. In youth, we possess time and energy but lack money. In middle age, we gain money and energy but lose time. In old age, we may finally have time and money, yet no longer have the energy to fully enjoy them.

 


 

Good to know

Greg Mankiw's blog is a website I visit often. I just realized that he’s the author of the two well-known economics textbooks Principles of Macroeconomics and Principles of Microeconomics. I had only known these books and their author by their Chinese names — which is a bit embarrassing.