• Sonuç bulunamadı

Evaluation of Diagnostic Tests & ROC Curve Analysis

N/A
N/A
Protected

Academic year: 2021

Share "Evaluation of Diagnostic Tests & ROC Curve Analysis"

Copied!
72
0
0

Yükleniyor.... (view fulltext now)

Tam metin

(1)

Evaluation of Diagnostic Tests

&

ROC Curve Analysis

PhD Özgür Tosun

(2)

TODAY’S EXAMPLE

Why a physician needs biostatistics?

(3)

Understanding the

“Statistics”

A 50-year-old woman, no symptoms, participates in routine mammography screening.

She tests positive, is alarmed, and wants to know from you whether she has

breast cancer for certain or what the chances are.

Apart from the screening results, you know nothing else about this woman.

How many women who test positive actually have breast cancer?

(4)

Additional Info

The probability that a woman has breast cancer is 1%

("prevalence")

If a woman has breast cancer, the probability that she tests positive is 90% ("sensitivity")

If a woman does not have breast cancer, the probability that she nevertheless tests positive is 9%

(”false positive rate")

(5)

Your answer???

a) nine in 10 (90%)

b) eight in 10 (80%)

c) one in 10 (10%)

d) one in 100 (1%)

(6)

ATTENTION !!

The fact that 90% of women with breast cancer get a positive

result from a mammogram

(sensitivity) doesn't mean that 90% of women with positive

results have breast cancer.

(7)

REALITY

Cancer Healthy

TEST

Positive 9 89 98

Negativ

e 1 901 902

10 990 1000

(8)

Prevalance

Sensitivity

False Positive Rate

(9)

Answer

Total positive test results among 1,000 women = 98

Only 9 of them are actually having cancer

How many women who test positive actually have breast cancer?

9/98 =~ one in 10 (10%)

The high false positive rate, combined with the disease's prevalence of 1%, means that roughly nine out of 10 women with a

worrying mammogram don't actually have breast cancer.

(10)

What Doctors Do with the Question?

In one trial, almost half the group of 160 gynecologists responded that the woman's chance of

having cancer was nine in 10 (90%).

Only 21% said that the figure was one in 10 (10%) - which is the

correct answer.

That's a worse result than if the doctors had been answering at random (25%).

(11)

What Happens When Doctor Does Not

Explain the Right Probabilities to the Patient?

How few specialists understand the risk a

woman with a positive mammogram result is worrying

We can only imagine how much anxiety those innumerate doctors cause in women

This may even lead to unnecessary cancer treatment to healthy woman

Research suggests that months after a

mammogram false alarm, up to a quarter of women are still affected by the process on a daily basis.

(12)
(13)

EVALUATION OF

DIAGNOSTIC TESTS

(14)

The “Gold Standard” :

What is a Gold Standard ?

Biopsy results, pathological evaluation, radiological

procedures, prolonged follow up, autopsies

Almost always more costly, invasive, less feasible

Lack of objective standards of

disease (e.g. angina Pectoris: Gold standard is careful history taking)

(15)

Diagnostic Characteristics

It is not hypothesis testing BUT

How well does the test identify patients with a disease?

How well does the test identify patients without a disease?

(16)

Evaluation of the Diagnostic Test

Give a group of people (with and without the disease) both tests

(the candidate test and the “gold standard” test) and then cross-

classify the results and report the diagnostic characteristics of the test.

(17)

Truth or Gold Standard

+ -

Candidate Test

+ a

(TP) b

(FP)

- c

(FN)

d (TN)

A perfect test would have b and c equal to 0

(18)

Diagnostic Characteristics

Sensitivity: The probability that a diseased individual will be

identified as “diseased” by the test

= P(T+ / D+) = a/(a+c)

Specificity: The probability that an individual without the

disease will be identified as

“healthy” by the test

= P(T - / D-) = d/(b+d)

(19)

Diagnostic Characteristics

False positive rate= Given a subject without the disease, the probability that he will have a positive test result

P(T+ / D-)

= b/(b+d)

= 1 – Specificity

False negative rate= Given a subject with the disease, the probability that he will have a

negative test result

P(T- / D+)

= c/(a+c)

= 1 – Sensitivity

(20)

Predictive Values of Diagnostic Tests

More informative from the patient or physician perspective

Special applications of Bayes Theorem

(21)

Predictive Values of Diagnostic Tests

Positive Predictive Value: The probability that an individual with a positive test result has the

disease

= P(D+ / T+) = a/(a+b)

(22)

Predictive Values of Diagnostic Tests

Negative Predictive Value:

The probability that an individual with a negative test result does not have the disease

= P(D- / T-) = d/(c+d)

(23)

A LAST SIMPLE

EXAMPLE TO SUM IT

UP

(24)

True Disease Status

Pos Neg

Test Criteri

on

Pos Neg

Suppose we have a test statistic for predicting the presence or absence of disease.

(25)

True Disease Status

Pos Neg

Test Criteri

on

Pos Neg

Suppose we have a test statistic for predicting the presence or absence of disease.

(26)

Suppose we have a test statistic for

predicting the presence or absence of disease.

True Disease Status

Pos Neg

Test Criteri

on

Pos TP  Neg

TP = True Positive

(27)

Suppose we have a test statistic for

predicting the presence or absence of disease.

True Disease Status

Pos Neg

Test Criteri

on

Pos Neg

(28)

Suppose we have a test statistic for

predicting the presence or absence of disease.

True Disease Status

Pos Neg

Test Criteri

on

Pos FP 

Neg

FP = False Positive

(29)

Suppose we have a test statistic for

predicting the presence or absence of disease.

True Disease Status

Pos Neg

Test Criteri

on

Pos Neg

(30)

Suppose we have a test statistic for

predicting the presence or absence of disease.

True Disease Status

Pos Neg

Test Criteri

on

Pos

Neg FN 

FN = False Negative

(31)

Suppose we have a test statistic for

predicting the presence or absence of disease.

True Disease Status

Pos Neg

Test Criteri

on

Pos Neg

(32)

Suppose we have a test statistic for

predicting the presence or absence of disease.

True Disease Status

Pos Neg

Test Criteri

on

Pos

Neg TN 

TN = True Negative

(33)

True Disease Status

Pos Neg

Test Criteri

on

Pos TP FP

Neg FN TN

P N P+ N

Suppose we have a test statistic for

predicting the presence or absence of disease.

(34)

True Disease Status

Pos Neg

Test Criteri

on

Pos TP FP

Ne

g FN TN

P N P+ N

Accuracy = Probability that the test yields a correct result.

= (TP+TN) / (P+N)

(35)

True Disease Status

Pos Neg

Test Criteri

on

Pos TP FP

Ne

g FN TN

P N P+ N

Sensitivity = Probability that a true case will test positive

= TP / P

Also referred to as True Positive Rate (TPR)

or True Positive Fraction (TPF).

(36)

True Disease Status

Pos Neg

Test Criteri

on

Pos TP FP

Neg FN TN

P N P+ N

Specificity = Probability that a true negative will test negative= TN / N

Also referred to as True Negative Rate (TNR)

or True Negative Fraction (TNF).

(37)

True Disease Status

Pos Neg

Test Criteri

on

Pos TP FP

Neg FN TN

P N P+ N

False Negative

Rate = Prob that a true positive will test negative

= FN / P = 1 - Sensitivity

Also referred to as False Negative Fraction (FNF).

(38)

True Disease Status

Pos Neg

Test Criteri

on

Pos TP FP

Neg FN TN

P N P+ N

False Positive

Rate = Prob that a true negative will test positive

= FP / N = 1 - Specificity

Also referred to as False Positive Fraction (FPF).

(39)

True Disease Status

Pos Neg

Test Criteri

on

Pos TP FP

Ne

g FN TN

P N P+ N

Positive Predictive

Value (PPV) = Probability that a positive test

will truly have disease

= TP / (TP+FP)

(40)

True Disease Status

Pos Neg

Test Criteri

on

Pos TP FP

Neg FN TN

P N P+ N

Negative Predictive

Value (NPV) = Probability that a negative test

will truly be disease free

= TN / (TN+FN)

(41)

True Disease Status

Pos Neg

Test Criteri

on

Pos 27 173 200

Neg 73 727 800

100 900 1000

27/100 = .27

Se

= Sp

=

727/900 = FPR = 1- .81

Spe =

.19

Acc

=

(27+727)/100 0 = .75

PPV

=

27/200

= .14 NPV

=

727/800

= .91 FNR = 1-

Sen =

.73

(42)

ROC CURVE

(43)

Introduction to ROC curves

ROC = Receiver Operating Characteristic

The ROC curve was first developed by electrical engineers and radar engineers during World War II for detecting enemy objects in battle fields

Soon introduced to psychology to account for perceptual detection of stimuli.

During World War II, for the analysis of radar signals.

Following the attack on Pearl Harbor in 1941, the United States army began new research to increase the prediction of correctly detected Japanese aircraft from their radar signals.

(44)

ROC

Receiver Operating Characteristics

• ROC analysis is developed for the signal receivers in radars

• Basic aim was to distinguish the enemy signals from normal signals

• It is a graphical analysis method

(45)

Development of Receiver Operating Characteristics (ROC) Curves

(46)

If you decrease the threshold (cut off), sensitivity will increase. You will be able to catch every (enemy) plane signals. However, noise in the data will also increase so that you will not be able to progress

(47)

ROC curve in this example includes alternative threshold (cut off) values and beware that the sensitivity and specificity will simultaneously change as we change the threshold. Remember, some signals are from the enemy planes while some are from normal.

(48)

ROC Analysis

“ROC analysis since then has been used in medicine, radiology, biometrics, and other areas for many decades.”

In medicine, ROC analysis has been extensively used in the evaluation of diagnostic tests.

ROC curves are also used extensively in epidemiology and medical research

Evidence-based medicine.

In radiology, ROC analysis is a common technique to evaluate new radiology techniques.

Can be used to compare tests & procedures

(49)

ROC Curves

Use and interpretation

The ROC methodology easily generalizes to test statistics that are continuous (such as lung function or a blood gas).

The ROC curve allows us to see, in a simple visual display, how sensitivity and

specificity vary as our threshold varies.

The shape of the curve also gives us some visual clues about the overall strength of association between the underlying test statistic and disease status.

(50)
(51)

Example

Test Result

People with disease People

without the disease

(52)

Test Result

Call these patients “negative” Call these patients “positive”

Threshold

(53)

Test Result

Call these patients “negative” Call these patients “positive”

without the disease with the disease

True Positives

Some definitions ...

(54)

Test Result

Call these patients “negative” Call these patients “positive”

without the disease with the disease

False Positives

(55)

Test Result

Call these patients “negative” Call these patients “positive”

without the disease with the disease

True

negatives

(56)

Test Result

Call these patients “negative” Call these patients “positive”

without the disease with the disease

False

negatives

(57)

Test Result

without the disease with the disease

‘‘-’’ ‘‘+’’

Moving the Threshold: right

(58)

Test Result

without the disease with the disease

‘‘-’’ ‘‘+’’

Moving the Threshold: lef

(59)

Diseased Healthy

Diseased Healthy GOLD

STANDARD

ALTERNATIVE TEST

9 8 7 6 5 4 3 2 1 0

F r e q u e n c y

0 100 200 300 Test parameter, mg/dl

GOLD STANDARD

(60)

Diseased Healthy

ALTERNATIVE TEST

F r e q u e n c y 0 100 200 300

Test parameter, mg/dl

Healthy Diseased

ALTERNATIVE TEST

6 5 4 3 2 1 0 1 2 3 4 5 6

F r e q u e n c y

0 100 200 300 Test parameter, mg/dl

6 5 4 3 2 1 0

(61)

FN False Negative TP True Positive TN True Negative FP False Positive

Diseased Healthy GOLD

STANDARD

5 4 3 2 1 0 1 2 3 4 5 6

F r e q u e n c y

0 100 200 300 400 500 Test parametresi, mg/dl

ALTERNATIVE TEST

TP FN

TN FP

Positive outcome Negative outcome

(62)

FN TN FP

TP FN

TN FP

TP

(63)

Sensitivity and Specificity

Sensitivity

Ability of a test to correctly diagnose the real patients.

Sensitivity = TP / ( TP + FN )

Specificity

Ability of a test to correctly diagnose

the real healthy people.

Specificity = TN / ( TN + FP )

TP FN

TN FP

(64)

FN TP TN FP

“Receiver Operating Characteristic” Curve

Measured Value Frequency

ı ı ı ı ı ı ı ı ı ı ı

Sensitivity

Specificity 1.0 -

- 0.8 - - 0.6 - - 0.4 - - 0.2 - -

0.0-ı ı ı ı ı ı ı ı ı ı ı

1.0 0.8 0.6 0.4 0.2 0.0

It is the graphical representation of all sensitivity and specificity combinations for every possible threshold (cut off) value. Aim is to differenciate the diseased and healthy subjects.

Sensitivity : 25 / 25 = 1.00 Specificity: 0 / 25 = 0.00 Sensitivity : 25 / 25 = 1.00

Specificity: 1 / 25 = 0.04 Sensitivity : 25 / 25 = 1.00 Specificity: 3 / 25 = 0.12

Sensitivity : 25 / 25 = 1.00 Specificity: 5 / 25 = 0.20 1 24

8 17

Sensitivity: 24 / 25 = 0.96 Specificity: 8 / 25 = 0.32 0 25

0 25 0 25

3 22 0 25 5 20 0 25 1 24

(65)

“Receiver Operating Characteristic” Curve

Frequency

Measured value

1

1

Sensitivity

Specificity 0

Area Under the Curve (AUC) shows the diagnostic performance of a test.

AUC is between 0.5 and 1.0

(66)

We can use ROC curves to compare the diagnostic performances of more than one alternative tests.

“Receiver Operating Characteristic” Curve

Frequency

Measured value

Frequency

Measured value

Test 2

Test 1 1

1 0

Sen

Spe 0

(67)

True Positive Rate (sensitivity)

0%

100%

False Positive Rate (1-specificity)

0% 100%

ROC curve

(68)

True Positive Rate

0

% 100%

False Positive Rate

0

%

100%

True Positive Rate

0

% 100%

False Positive Rate

0

%

100%

A good test: A poor test:

ROC curve comparison

(69)

Best Test: Worst test:

True Positive Rate

0

% 100%

False Positive Rate

0

%

100

%

True Positive Rate

0

% 100%

False Positive Rate

0

%

100

%

The distributions

don’t overlap at all The distributions overlap completely (Tossing a coin)

ROC curve extremes

(70)

Area under ROC curve (AUC)

Overall measure of test performance

Comparisons between two tests based on differences between (estimated)

AUC

For continuous data, AUC equivalent to Mann-Whitney U-statistic

(nonparametric test of difference in location between two populations)

(71)

True Positive Rate

0

% 100%

False Positive Rate

0

%

100

%

True Positive Rate

0

% 100%

False Positive Rate

0

%

100

%

True Positive Rate

0

% 100%

False Positive Rate

0

%

100

%

AUC = 50%

AUC = 90%

AUC = 65%

AUC = 100%

True Positive Rate

0

% 100%

False Positive Rate

0

%

100

%

AUC for ROC curves

(72)

Interpretation of AUC

AUC can be interpreted as the

probability that the test result from a randomly chosen diseased individual is more indicative of disease than

that from a randomly chosen healthy individual

No clinically relevant meaning

Referanslar

Benzer Belgeler

In the present study, we aimed at determining the levels of depression, exhaustion, pain and quality of life (QoL) in relatives taking care of patients with

[r]

Tarih ve Toplum dergisinin son sayılarında Taha Toros üstadımızın bir yazısında okudum; Adnan Saygun da bir Halkevi müfettişi imiş, hem de “ türkü

A- radan beşyüz sene geçtikten sonra Serezdeki medfeninden alınarak kemikleri bir torba içinde 1924 de İstanbul’a ge­ tirilmiş ve nedense Topkapı Sarayı

Although there has been no effective therapy in patients with incomplete KD resistant to IVIG and aspirin, one of our authors previously reported the beneficial effect of low-dose

Subgroup analysis was based on ethnicity (Asian vs. non- Asian), detection methods for miRNAs (qPCR vs. qRT-PCR), sample sources (blood vs. PBMCs), miR- NAs profiling (single miRNA

Abnormal heart rate responses to exercise predict increased long-term mortality regardless of coronary disease extent: the question is why.. J Am Coll Cardiol 2003;

whether HRR at first minute (HRR1) predicted the presence and severity of CAD by measuring post-exercise HRR during a cool- down period in the sitting position after treadmill