Does the decision in a validation process of a surrogate endpoint change
with level of signi
ficance of treatment effect? A proposal on validation of
surrogate endpoints
Y. Sertdemir
⁎
, R. Burgut
Cukurova University School of Medicine, Department of Biostatistics, 01130 Balcali-Adana, Turkey
a r t i c l e i n f o
a b s t r a c t
Article history: Received 2 April 2008 Accepted 25 August 2008
Background:In recent years the use of surrogate end points (S) has become an interesting issue. In clinical trials, it is important to get treatment outcomes as early as possible. For this reason there is a need for surrogate endpoints (S) which are measured earlier than the true endpoint (T). However, before a surrogate endpoint can be used it must be validated. For a candidate surrogate endpoint, for example time to recurrence, the validation result may change dramatically between clinical trials. The aim of this study is to show how the validation criterion (R2trial) proposed by Buyse et al. are influenced by the magnitude of treatment effect with an application using real data.
Methods:The criterion R2
trialproposed by Buyse et al. (2000) is applied to the four data sets from colon cancer clinical trials (C-01, C-02, C-03 and C-04). Each clinical trial is analyzed separately for treatment effect on survival (true endpoint) and recurrence free survival (surrogate endpoint) and this analysis is done also for each center in each trial. Results are used for standard validation analysis. The centers were grouped by the Wald statistic in 3 equal groups.
Results:Validation criteria R2
trialwere 0.641 95% CI (0.432–0.782), 0.223 95% CI (0.008–0.503), 0.761 95% CI (0.550–0.872) and 0.560 95% CI (0.404–0.687) for C-01, C-02, C-03 and C-04 respectively. The R2
trialcriteria changed by the Wald statistics observed for the centers used in the validation process. Higher the Wald statistic groups are higher the R2
trialvalues observed.
Conclusion:The recurrence free survival is not a good surrogate for overall survival in clinical trials with non significant treatment effects and moderate for significant treatment effects. This shows that the level of significance of treatment effect should be taken into account in validation process of surrogate endpoints.
© 2008 Elsevier Inc. All rights reserved.
Keywords: Surrogate endpoint Validation criteria Colon cancer Meta-analytic Clinical trials 1. Introduction
In the year 2006, 542 of 100.000 man and 404 of 100.000 women developed cancer. In the same year 234 of 100,000 man and 160 of 100,000 women died from cancer[1]. In a population of 300,000,000 people this would mean 1,500,000 new cancer cases and 600,000 deaths for the year 2007. Every
year, new treatments are developed to lower the deaths from cancer and other fatal diseases. These new treatments need to be tested in clinical trials. But clinical trials often take up to 10 years when the primary (true) endpoint is survival. Because of this reason there is need for alternative (surrogate) endpoints which can give the same information on treatment effect earlier than the primary endpoint. The use of surrogate endpoints can shorten the follow up time and reduce the number of patients needed for a clinical trial[2,3]. A surrogate end point has been defined as an alternative end point (such as a biological marker, physical sign, or precursor event) that
⁎ Corresponding author.
E-mail addresses:yasarser@cu.edu.tr(Y. Sertdemir),refik@cu.edu.tr
(R. Burgut).
1551-7144/$– see front matter © 2008 Elsevier Inc. All rights reserved. doi:10.1016/j.cct.2008.08.006
Contents lists available atScienceDirect
Contemporary Clinical Trials
can be used as a substitute for a clinically meaningful end point that measures directly how a patient feels, functions, or survives[4].
However before a surrogate endpoint can be used it needs to be validated. For a candidate surrogate endpoint, for example recurrence free survival (RFS), the validation result may change dramatically among clinical trials. The aim of this study is to show how the validation Criterion (R2trial) proposed
by Buyse et al. [5] are influenced by the magnitude of treatment effect using real data.
2. Surrogate endpoint validation
In clinical research, the endpoint of greatest relevance to inferences concerning therapeutic efficacy is frequently not practical or even feasible to measure. Sometimes the determi-nation of the true endpoint (T) is difficult, requiring an expensive, invasive or uncomfortable procedure. Sometimes it is unobservable for an impractically long interval. Occasionally the true endpoint is not directly measurable at all. In these cases we must rely on alternative or surrogate endpoints (S)[6].
In the past, the use of the surrogate was based on the correlation between S and T. The existence of such a correlation between the endpoints is not sufficient for using it as a surrogate. As Fleming and DeMets [7] stated; “A correlate does not a surrogate make”. It is required that the effect of treatment on the surrogate predicts the effect of treatment on the true endpoint. To be useful, a surrogate endpoint should be strongly associated with the true outcome, lie in the causal pathway for the definitive outcome, should manifest early in the course of follow-up, and should be relatively easy to measure. However, the defining characteristic is that the surrogate outcome should be affected by treatment in the same way (direction and relative magnitude) as the definitive outcome. And this last characteristic is the one, which is difficult to verify[8].
The validation of surrogate endpoints is a difficult task. Prentice [9] defined a surrogate endpoint as “a response variable for which a test of the null hypothesis of no relationship to the treatment groups under comparison is also a valid test of the corresponding null hypothesis based on the true endpoint”. Prentice proposed some criteria to validate a surrogate endpoint. Since there were problems in proving Prentice criteria, Freedman et al. [10] suggested focusing on the proportion of the treatment effect explained by the surrogate (PE). Freedman et al.[10], also noted that the confidence limits of PE are generally too wide to be informative unless the treatment effect on the true endpoint is highly significant. Others showed that the PE could be larger than 1 or negative, which can hardly be justified for a proportion[5,13]. This indicates that in the case of significant treatment effects this criterion can be useful but for non-significant treatment effects, it could be a misleading quantity to use in the validation of surrogates[11].
Buyse M and Molenberghs G, [12] proposed two other quantities to replace PE. Thefirst quantity was called “Relative Effect”; it is the effect of the treatment on the true endpoint relative to that on the surrogate endpoint. This quantity depends on the scales chosen to measure S and T. and the second one was the“Adjusted Association”. The treatment-adjusted association γZis the subject-specific association between the surrogate and
true endpoints, adjusting for treatment. The slope of the linear
regression between the trial-level effects of treatment upon both endpoints is useful for prediction purposes; the coefficient of determination (R2) of this linear regression provides a measure
of strength of the association between the effects. This measure was termed Rtrial2 and suggests to call a surrogate trial level-valid
if Rtrial2 is sufficiently close to 1. By analogy, Buyse et al. [5]
redefined the individual-level association between both endpoints as a coefficient of determination, which they termed Rindividual2 . A surrogate is called individual-level valid
if Rindividual2 is sufficiently close to 1.
3. Colon cancer clinical trials
Colon Cancer is a highly treatable and often curable disease when localized to the bowel. It is the second most frequently diagnosed malignancy in the United States as well as the second most common cause of cancer death. The primary treatment is surgery which results in cure in approximately 50% of patients. Recurrence after surgery is a major problem and is often the ultimate cause of death. Surrogate endpoints, like recurrence free survival (RFS) or disease free survival (DFS) as a surrogate for overall survival has been investigated in various studies [14,15]. In one investigation, time to recurrence seems to be a very weak surrogate [14] and in another it is a moderate to good surrogate. The result of validation study may differ from clinical trial to trial, number of center/trial or type of analysis. Four data sets from The National Surgical Adjuvant Breast and Bowel Project (NSABP), with protocol number 01, 02, C-03 and C-04 are used as real data sets.
4. Model descriptions and setting
This section describes the meta-analytic approach and the models used for surrogate endpoint validation.
The meta-analytic approach for two normally distributed endpoints was proposed by Buyse et al. [5]. Here the true endpoint (T) and the surrogate endpoint (S) are continuous, normally distributed random variables and completely observed for all patients. In the notation i = 1, 2…N is used for trials and j = 1, 2…nifor subjects within a trial. For each
patient the triplets (Sij, Tij, Zij) are assumed to be observed.
Thefirst stage is based upon a trial-specific model: SijjZij¼ μsjþ αiZijþ eSij ð1Þ
TijjZij¼ μTjþ βiZijþ eTij ð2Þ
whereμSiandμTiare trial-specific intercepts and αiandβiare
effects of treatment (Z) on S and T.ɛSiandɛTiare correlated
error terms, which are assumed to be mean-zero normally distributed with covariance matrix
Σ ¼ σSS σST
σTT
ð3Þ
At the second stage, it is assumed that μSi μTi αi βi 0 B B @ 1 C C A ¼ μS μT α β 0 B B @ 1 C C A þ mSi mTi ai bi 0 B B @ 1 C C A ð4Þ
The second term on the right hand side of Eq. (4) follows a zero-mean normal distribution, with dispersion matrix.
D¼ dSS dST dSa dSb dTT dTa dTb daa dab dbb 0 B B @ 1 C C A ð5Þ
The random-effects representation is based upon combin-ing both steps:
SijjZij¼ μsþ mSiþ αZijþ aiZijþ eSij ð6Þ
TijjZij¼ μTþ mTiþ βZijþ biZijþ eTij ð7Þ
The next step considered by Buyse et al.[5]focused on prediction. Assuming that we have only data on S in a new trial i = 0 and we are interested in the estimated treatment effect of Z on T (β+ b0|mS0, a0), given the effect on S.
Eðβ þ b0jmS0; a0Þ ¼ β þ ddSb ab T dSS dSa dSa daa −1 μS0−μS α0−α ð8Þ Varðβ þ b0jmS0; a0Þ ¼ dbb− ddSb ab T dSS dSa dSa daa −1 d Sb dab ð9Þ
In relation to the prediction Eqs. (8) and (9), the quantity to assess the quality of a surrogate at the trial level is the coefficient of determination
R2 Trial fð Þ¼ R2bijmSi;ai¼ dSb dab T dSS dSa dSa daa −1 d Sb dab dbb ð10Þ
The index“Trial (f)” indicates that this coefficient pertains to the distribution ofβiconditional on the full set of trial-specific
parameters for S in model (4) i.e., onμSiandαi.
In a simpler setting where b0is predicted independently
from mS0the coefficient R2trialin Eq. (10) reduces to Eq. (11).
R2
Trial rð Þ¼ R2bijai¼
dab
daadbb ð11Þ
R2
trialis the square of the correlation between aiand bi.
This coefficient measures how precisely one can predict the effect of treatment on the true endpoint in a new trial, based on the previous data and the observed treatment effect on the surrogate endpoint in the new trial.
It is essential to explore the quality of the prediction of the treatment effect on the true endpoint in trial i by a) information obtained in the validation process based on trials i = 1…N and b) the estimate of the effect of Z on S in a new trial i = 0.
It is worth noting that the D matrix is required to be positive-definite for R2
trial to be a meaningful measure. A
surrogate is said to be‘perfect at the trial level’ when R2 trialis
equal to 1[5]. After adjustment for the effects of treatment Z, the association between S and T is captured by Σ. The surrogate is said to be‘perfect at the individual level’ if the coefficient of determination
R2 indiv¼ R2eTijeSi¼ σ2 ST σSSσTT ð12Þ is equal to 1. R2
indivis just the correlation between S and T after
accounting for the trial and treatment effect [5]. But this criterion will not be discussed in this work.
An additional model is needed to calculate the PE criterion which is given below:
TijjZij; Sij¼ ~μTþ βSZijþ γSijþ ~eTij ð13Þ
PE is calculated using the formula (14)[10], PE¼ 1−βS
β ð14Þ
5. Analysis of case studies 5.1. Data sets
The four data sets on colon cancer clinical trials (01, C-02, C-03 and C-04) [16–19]from NSABP were analyzed as follows; each trial wasfirst analyzed separately. In data set C-01 four treatment arms were recoded into 2 groups; only Operation(OP) and OP + MOF were recoded into first group and the Operation + BCG(Pasteur) and Operation + BCG(Con-naught) treatments were recoded into second group. In data set C-04 the treatments FU + LV and FU + LV + LEV were recoded into the same treatment group. This recoding was
Table 1
Percent censoring, p-value for treatment effect, number of centers used for analysis, R2
trialcriterion, PE criterion, and bootstrap confidence intervals by clinical trial
Trial % censoring p-value⁎⁎⁎ # Centers Mean Wald 95% CI Wald R2
trial(95% CI)⁎ PE (95% CI)⁎⁎
C-01 37 0.210 28 .81 .413, 1.22 0.641 (0.432, 0.782) 0.361 (−4.65, 5.31)
C-02 53 0.110 12 .68 .078, 1.29 0.223 (0.008, 0.503) 1.271 (−1.28, 4.95)
C-03 62 b0.001 37 .85 .508, 1.19 0.761 (0.550, 0.872) 0.864 (0.18, 1.64)
C-04 68 0.081 36 .48 .210, .744 0.560 (0.404, 0.687) 1.67 (−1.11, 6.15)
⁎Based on 1000 bootstrap replications, ⁎⁎based on 500 bootstrap replications, ⁎⁎⁎for treatment effect on true endpoint (survival). Table 2
Centers from clinical trials by grouped Wald statistic
Wald statistic C-01 C-02 C-03 C-04 Total
W≤0.072 10 4 9 15 38
0.072bW≤0.52 8 3 14 13 38
WN0.52 10 5 14 8 37
made to reduce the number of treatment arms into 2 groups resulting in more cases for each center. For each trial, centers with at least 3 observations and with a minimum of 2 events in each treatment arm were considered for analysis; all other centers were grouped into one center. For the true endpoint (overall survival) and the surrogate endpoint (recurrence free survival), a Cox regression analysis within each center was applied whereβˆiis the coefficient of treatment effect on true
endpoint and αˆi is the coefficient of treatment effect on
surrogate endpoint. Theβˆi, se(βˆi), Wald(βˆi),αˆi, se(αˆi), Wald
(αˆi) values were recorded. The Wald statistics which are (βˆi/
se(βˆi))2in the Cox-regression analysis for treatment effect on
overall survival shown inTables 1 and 2are the statistics used to test whether the treatment effects are significantly different from zero. The mean Wald statistic for βˆi and
their 95%CI for data sets C-01, C-02, C-03, C-04 were 0.81 (0.413–1.22), 0.68 (0.078–1.29), 0.85 (0.508–1.19) and 0.48 (0.210–0.744) respectively. The estimated R2
trialand PE values
for the data sets C-01, C-02, C-03 and C-04 with their 95% bootstrap CI which are based on 1000 and 500 replications respectively are given inTable 1.
The observed βˆi, se(βˆi), Wald(βˆi), αˆi, se(αˆi), Wald(αˆi)
values from 4 clinical trials were merged and grouped by the Wald statistic ofβˆiinto 3 groups by equal spaced percentiles.
The distribution of centers, from each clinical trial by group is given inTable 2.
One thousand bootstrap replications were applied to get the 95% CI for R2
trialin each group. In the first group with
Waldb0.072 the estimated R2
trial was 0.09, in the second
group 0.072b=Wb=0.52 R2
trialwas 0.44 and in the last group
Fig. 1. Bootstrapped R2
trialfrom the regression ofβˆionαˆiand Rtrial2 from the regression of SE(βˆi) on SE(αˆi) for grouped Wald values (1000 replications for each).
WN0.52 R2
trialwas 0.77, in the same way we estimated R2from
the regression of SE(βˆi) on SE(αˆi). The bootstrap results are
given inFig. 1.
6. Results for case study As shown inTable 1the R2
trialvalues are higher for trials
with higher Wald values except the C-02 data set where only 12 centers were eligible for analysis. The 95% CI for the R2
trial
value is very wide for this clinical trial.
InFig. 1it is observed that R2from the regression of SE(βˆi) on
SE(αˆi) is relatively stable for grouped Wald values. Whereas R2trial
from the regression of (βˆi) on (αˆi) increases as the Wald statistic
increases and reaches the value of R2from the regression of SE
(βˆi) on SE (αˆi) at the group with highest Wald values (WN0.52).
In Fig. 2the range of PE criteria changes for different clinical trials and has the smallest range between 0-1 for the C-03 clinical trial which has the smallest p-value (highest Wald) for treatment effect on overall survival.
7. Discussion
The bootstrap results for the PE criteria showed that this criterion can take values out of the range [0,1] which is difficult to interpret for a proportion. Whereas it has only an acceptable result for the most significant C-03 clinical trial. This shows that the p-value of treatment effect has an important role on validation of surrogate endpoints.
Marc Buyse et al.[20]evaluated Progression-Free Survival (PFS) as a Surrogate for Survival in Advanced Colorectal Cancer using the R2
trialcriterion. They used a proportional
hazard regression model with treatment as the only factor for each trial in their meta-analysis. The effect of treatment on both endpoints was used in a regression analysis.
Using the same approach we estimated the validation criteria for RFS as a surrogate for OS, but in our study our aim is not to prove the validity of RFS. Our aim is to show how these criteria are affected by treatment effect. In the validation process of surrogate endpoints in different data sets with different treatments we observed different R2
trial
estimates with wide range of confidence intervals. In the case study we observed higher R2
trialvalues for groups with higher
values of Wald statistics and relatively stable Corr(SE(βˆi),SE
(αˆi)). At this point it is difficult to decide whether the bias for
R2
trialdecreases or increases with higher significant treatment
effects. Friedman et al.[10]stated that the validation process using the Proportion Explained with adequate statistical power would requireβ/σN4, which supports the observations in our study where we observed that clinical trials or centers with more significant treatment effects should have more weight in the validation process.
We conclude also that R2
trialshould be evaluated together
with R2indivat different levels of significance and should be
compared to Corr(SE(βˆi),SE(αˆi)). The Corr(SE(βˆi),SE(αˆi))
might give information on the direction of the bias for R2 trial.
If the observed R2
(SE(βˆi),SE(αˆi))values are higher than the
R2
trialvalues and the observed R2trialvalues increase with higher
significant treatment effects, this might be an indication of underestimated R2
trial. In such situation, the use of a weighted
regression analysis where the weights are the test statistics (for example the Wald statistic or its square root) for the
treatment effect observed for each center or trial is recom-mended. The validation criterion R2
trialshould be evaluated
together with R2
individualbecause the value of R2individualhas
also an effect on the R2
trial criterion. But nevertheless
validation results should be handled with care in prediction processes since this is only an approximation procedure for the true endpoint.
Acknowledgments
We are grateful to the late Harry Samuel Wieand, Ph.D., University of Pittsburgh for letting us to use data from the protocol; C-01, C-02, C-03 and C-04 as an example.
References
[1] CANCER STATISTICS WORKING GROUP. United States Cancer Statistics: 2003 incidence and mortality. Atlanta: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention and National Cancer Institute; 2006.
[2] Wittes J, Lakatos E, Probstfield J. Surrogate endpoints in clinical trials: cardiovascular diseases. Stat Med 1989;8:415–25 1989.
[3] Herson J. The use of surrogate endpoints in clinical trials (an introduction to a series of four papers). Stat Med 1989;8:403–4. [4] Temple RJ. A regulatory authority's opinion about surrogate endpoints.
In: Nimmo WS, Tucker GT, editors. Clinical Measurement in Drug Evaluation. New York: J. Wiley and Sons; 1995.
[5] Buyse M, Molenberghs G, Burzykowski T, Geys H, Renard D. The validation of surrogate endpoints in meta-analyses of randomized experiments. Biostatistics 2000;1:1–19.
[6] Elenberg S, Michael Hamilton J. Surrogate endpoints in clinical trials: cancer. Stat Med 1989;8:405–13.
[7] Fleming TR, DeMetz DL. Surrogate end points in clinical trials: are we being misled? Ann Intern Med 1996;125:605–13 1996.
[8] Piantadosi S. Some statistical issues in the design of cancer clinical trials with surrogate end points. Abstracts From the Program of the Second Annual Meeting of the American Society for Experimental Neurother-apeutics, Washington, DC, March 23–25; 2000.
[9] Prentice RL. Surrogate markers in clinical trials: definition and operational criteria. Stat Med 1989;8:431–40 1989.
[10] Freedman LS, Graubard MI, Schatzkin A. Statistical validation of intermediate endpoints for chronic diseases. Stat Med 1992;11:167–78. [11] Molenberghs G, Buyse M, Geys H, Renard D, Burzykowski T. Statistical challenges in the evaluation of surrogate endpoints in randomized trials. Control Clin Trials 2002;23:607–25.
[12] Buyse M, Molenberghs G. Criteria for the validation of surrogate endpoints in randomized experiments. Biometrics 1998;54:1014–29 1998. [13] Buyse M, Molenberghs G, Burzykowski T, Renard D, Geys H. Statistical
validation of surrogate endpoints: problems and proposals. Drug Inf J 2000;34:447–57.
[14] Burzykowski T, Molenberghs G, Buyse M, Geys H, Renard D. Validation of surrogate end points in multiple randomized clinical trials with failure time end points. J R Stat Soc Appl Stat Ser C 2001;50:405–22 2001. [15] Sargent D, Wieand S, Haller DG, et al. Disease-free survival (DFS) vs.
overall survival (OS) as a primary endpoint for adjuvant colon cancer studies: individual patient data from 20,898 patients on 18 randomized trials. J Clin Oncol 2005;23(34):8664–70.
[16] Wolmark N, Fisher B, Rockette H, Redmond C, et al. Postoperative adjuvant chemotherapy or BCG for colon cancer: results from NSABP protocol C-01. J Natl Cancer Inst 1988;80:30–6.
[17] Wolmark N, Rockette H, Wickerham DL, et al. Adjuvant therapy of dukes' A, B, and C adenocarcinoma of the colon with portal-veinfluorouracil hepatic infusion: preliminary results of national surgical adjuvant breast and bowel project protocol C-02. J Clin Oncol 1990;8:1466–75. [18] Wolmark N, Rockette HE, Fisher B, et al. The benefit of
leucovorin-modulated 5-FU as postoperative adjuvant therapy for primary colon cancer: results from NSAPB protocol C-03. J Clin Oncol 1993;11 (10):1879–87.
[19] Wolmark N, Rockette H, Mamounas E, et al. A clinical trial to assess the relative efficacy of 5-FU + leucovorin, 5-FU+ Levamisole, and 5-FU + Leucovorin + Levamisole in patientes with dukes B and C carcinoma of the colon: results from NSABP C-04. J Clin Oncolo 1999;17(11):3553–9. [20] Buyse M, Burzykowski T, Carroll K, et al. Progression-free survival is a surrogate for survival in advanced colorectal cancer. J Clin Oncol November 20 2007;25(33).