Either the 4Kpanel (logit scale) or clinical serum PSA (log-transformed)
was used in models. Prediction models were built using data in the
training set, and then clinical performance was assessed using the
testing set. We followed the principles set forth by the US Food and Drug
Administration critical path initiative, using an established biomarker
with analytic validity for the intent of clinical validation in the intended
use population
[7]. Furthermore, we followed reporting recommenda-
tions for tumor marker prognostic studies (REMARK)
[8]and the Tumor
Marker Utility Grading System
[9]in reporting the clinical utility of the
biomarker panel.
2.3.1.
Model building
Data from initial and subsequent biopsy groups were combined for
model development. Interaction terms between biopsy group (initial vs
subsequent surveillance biopsy) and other variables were evaluated to
investigate whether effects may differ for an initial biopsy and a
subsequent biopsy. Logistic regression was used to fit the models, with
robust variance to account for the correlation among multiple biopsies
on the same patient. Forward stepwise model selection procedures were
implemented. Variable selection criteria included
p
<
0.15, area under
the receiver operating characteristic(ROC) curve(AUC) 0.005, or quasi-
likelihood under the independence model criterion (QIC) with threshold
of zero
[5]. Final models were compared to identify variables that were
robust to selection procedures. We first identified a full model including
clinical predictors and 4Kpanel, and then a base model with serum PSA
substituted for the 4Kpanel. In some clinics, prostate volume may not be
reliably available, so models without prostate volume were fitted
sequentially.
2.3.2.
Model validation
Calibration plots were used to gauge the goodness of fit of each model.
We used ROC analyses and AUC to assess the discriminatory capacity of a
model for separating patients with and without reclassification. Decision
curve analysis (DCA) was used to report the clinical net benefit of each
model compared to biopsy-all and biopsy-none strategies
[6]. The
potential clinical impact was illustrated by plotting the number of
cancers missed versus the number of biopsies avoided per 1000 individ-
uals. To illustrate the clinical consequence of each model, we report the
number of biopsies that could be avoided and the number of Gleason 7
cancers that might be missed if a risk-based threshold is applied as a
criterion for biopsy. All evaluations were conducted on the initial biopsy
and subsequent biopsy groups separately and combined. Confidence
intervals (CIs) and significance tests were calculated using the bootstrap
resampling procedure to account for within-subject correlations. All
analyses were conducted using R version 3.1.1
( www.r-project.org ).
3.
Results
Of the 718 men in this study, there were 478 participants in
the initial biopsy group for whom kallikreins were assayed:
319 in the training set (60 [18.8%] with Gleason 7) and
159 in the test set (34 [21.4%] with Gleason 7;
Table 1 ). In
bivariate analyses, prostate volume, ratio of positive to total
cores, and the 4Kpanel were significantly associated with
grade reclassification. There were 444 participants (of
whom 204 were also in the initial biopsy group) with
633 subsequent surveillance biopsies, 422 in the training
set (70 [17%] with Gleason 7;
Table 2 )and 211 in the test
set (31 [15%] with Gleason 7; Supplementary Table 1).
Biopsies in this group ranged from the second to eighth after
diagnosis, and most patients had Gleason score 6 or no
cancer at their surveillance biopsies, varying slightly across
biopsy number.
In the full clinical model
( Table 3) including the
4Kpanel, significant predictors for reclassification were
BMI (odds ratio [OR] 1.09, 95% CI 1.04–1.14],
>
20% of cores
positive in the prior biopsy (OR 2.10, 95% CI 1.33–3.32), a
history of two or more biopsies negative for cancer (OR
0.19, 95% CI 0.04–0.85), prostate volume (per fold
increase, OR 0.47, 95% CI 0.31–0.70), and 4Kpanel (OR
1.5, 95% CI 1.31–1.81). In the clinical model with serum
PSA replacing the 4Kpanel, PSA was significantly associ-
ated with reclassification (per fold increase, OR 2.11, 95%
CI 1.53–2.91) and age was not. In models that did not
include prostate volume, the effects were similar for
covariates left in the model (Supplementary Table 2).
Model calibration in the test set showed predicted
probabilities of reclassification closely matching the
empirical rates (Supplementary Fig. 1).
Table 1 – Characteristics for 478 participants with kallikreins assayed before the initial surveillance biopsy after diagnosis for combined
Gleason score <7 versus
I
7 for the training and test cohorts
Characteristics
Training set
Test set
Gleason
<
7
Gleason 7
p
value
Gleason
<
7
Gleason 7
p
value
Sample size (
n
)
259
60
125
34
Age at diagnosis (yr)
63 (58–67)
64 (60–68)
0.109
64 (58–68)
64 (57–67)
0.876
Body mass index (kg/m
2
)
27 (25–30)
28 (25–33)
0.116
27 (25–29)
28 (26–31)
0.305
Race
Non–African American
248 (96)
56 (93)
121 (97)
29 (85)
African American
11 (4)
4 (7)
0.646
4 (3)
5 (15)
0.522
Time from diagnosis (mo)
12.0 (8.4–14.1)
12.7 (8.6–14.8)
0.237
12.2 (8.8–14.0)
12.6 (10.3–17.6)
0.189
Digital rectal examination
Normal
238 (92)
55 (92)
118 (94)
30 (88)
Abnormal
21 (8)
5 (8)
0.971
7 (6)
4 (12)
0.031
Prostate volume (cm
3
)
41.0 (30.0–56.5)
35.5 (25.0–50.0)
0.041
40.0 (30.0–51.0)
30.0 (24.0–42.8)
0.006
Positive:total core ratio
0.08 (0.08–0.17)
0.17 (0.08–0.20)
<
0.001
0.08 (0.08–0.17)
0.17 (0.17–0.25)
<
0.001
Clinical serum PSA (ng/ml)
4.60 (2.91–6.40)
4.81 (4.35–6.42)
0.108
4.56 (3.11–6.24)
5.65 (4.58–7.88)
0.024
4Kpanel (logit)
0.21 (0.08–0.29)
0.32 (0.16–0.44)
<
0.001
0.20 (0.07–0.28)
0.36 (0.18–0.53)
<
0.001
PSA = prostate-specific antigen.
Data are presented as median (interquartile range) for continuous variables and as n (%) for categorical variables.
E U R O P E A N U R O L O G Y 7 2 ( 2 0 1 7 ) 4 4 8 – 4 5 4
450




