Results
At least two of the three readers found progression, i.e. at least two new lesions, in 120 patients and no progression in 135 patients. In 11 patients with very severe metastatic disease, the readers did not find it possible to decide about progression, and these cases were excluded. All three readers agreed in 87% (222/255) of the remaining cases, and in 13% of the cases, one found progression while the other two did not, or vice versa, i.e. several cases were difficult to classify as progression or no progression, even for experienced readers.
The automated method detected at least two new lesions in 112 of the 120 cases with progression, according to the experts, i.e. a sensitivity of 93%. In the remaining eight cases, the automated method detected one new lesion in each case. Among the 135 cases without progression according to the experts, the automated method showed a specificity of 87% (117/135). The corresponding negative and positive predictive values were 94% and 86%, respectively.
The median change in BSI from the first to the second scan was an increase of 13% (range 59% decrease to 438% increase) in the treatment group. In a univariate Cox analysis, both the ‘percentage change in BSI’ (Hazard ratio 1.005; 95% CI, 1.001 to 1.008; p = 0.008) and the ‘number of new lesions’ (Hazard ratio 1.06; 95% CI, 1.02 to 1.09; p = 0.0004) were associated with survival. A total of 17 of the 31 patients in the treatment group showed an increase in BSI during treatment. Only three (18%) of these patients were alive 2 years after the second scan. Of the 14 patients with a decrease in BSI during treatment, eight (57%) were alive after 2 years. The Kaplan-Meier curves for patients with increase and decrease in BSI were significantly different (p = 0.02) (Figure 3A). Two new lesions were found in 23 patients, and eight (35%) of these patients were alive after two years. Three of the eight (38%) patients without two new lesions were alive after two years. The Kaplan-Meier curves for patients with and without at least two new lesions were not significantly different (Figure 3B).
Discussion
The results of this study show that an automated method can be used to detect new lesions and changes in BSI in serial bone scans, and that these values contain prognostic information in a group of patients on-treatment with docetaxel. A sensitivity of 93% and a specificity of 87% for the automated method are high considering the inter-observer variability among human observers. Our gold standard was based on the classifications of three experienced observers, and they all agreed in 87% of the cases. Other studies have also shown substantial inter-observer variability. Sadik et al. studied bone scan classifications from 37 readers, and on average, found agreement between paired readers of 64% [14]. The lower value in that study can, at least partly, be explained by the fact that the classification was performed using a four-grade scale, and the fact that the readers came from 18 different hospitals. Ore et al., on the other hand, reported inter-observer agreement of 91% based on two observers in a smaller patient sample [15]. The difficulty of interpreting bone scan changes even for experienced readers is the incentive to develop an objective method in order to minimise disagreement among observers. Intra-observer variability was also found in our recent study in which one reader analysed the same bone scans twice on different occasions and calculated BSI visually [12].
In the treatment group, only 18% of the patients with an increase in BSI were alive after 2 years, while 57% of those with a decreasing BSI were alive after the same period. These results are in agreement with those of Dennis et al., who demonstrated the prognostic value of BSI as a response indicator in prostate cancer patients [11]. In this study, the PCWG2 criterion of two or more new lesions was not prognostic. The number of new lesions was, however, significantly associated with survival, indicating that a criterion other than two new lesions might be valuable. It might even be a combination of the number of new lesions and the percentage change in BSI that proves to contain the most prognostic information.
We used the subjective classifications by three experienced bone scan readers as the gold standard in the evaluation group. This is not an optimum gold standard, but no independent examinations of these patients were available that confirmed or excluded the presence of new lesions. We therefore added an evaluation group to assess the prognostic value of the automated measurements of new lesions and changes in the BSI.
The results of this study are based on a retrospectively selected group of clinical cases, and as a consequence, there is a lack of standardisation of the imaging times pre and post treatment. This lack of standardisation can be a confounding factor weakening the association between BSI and survival. In future prospective studies and retrospective studies based on cases from clinical trials, the time range between baseline scan and start of treatment as well as between baseline scan and follow-up scans should be more standardised, to strengthen the analysis.
A limitation of this study was that the treatment group was small and the analysis retrospective. The results are encouraging, but the clinical value of the automated quantitative analysis of serial bone scans needs to be confirmed in future studies. In a larger study, the imaging biomarkers, number of new lesions and percentage change in BSI could also be related to other biomarkers such as PSA. The PSA level and change during treatment are widely used, but it is well known that they are not reliable in castrate-resistant prostate cancer. Imaging biomarkers could, therefore, provide additional information in the management of prostate cancer patients.
Limitations of the bone scintigraphy method are also to be taken into consideration in a quantitative analysis. Flare response, especially early on in treatment, may result in misjudgement while measuring BSI. Lesions in a bone scan are non-specific, and they may be due to degenerative disease, fractures, etc. The automated software has proved to be as capable of interpreting bone scans and differentiating metastatic lesions from degenerative abnormalities as an experienced physician [16], but patient history is crucial to the detection of fractures after trauma.