Skip to main content

Visual and quantitative evaluation of [18F]FES and [18F]FDHT PET in patients with metastatic breast cancer: an interobserver variability study



Correct identification of tumour receptor status is important for treatment decisions in breast cancer. [18F]FES PET and [18F]FDHT PET allow non-invasive assessment of the oestrogen (ER) and androgen receptor (AR) status of individual lesions within a patient. Despite standardised analysis techniques, interobserver variability can significantly affect the interpretation of PET results and thus clinical applicability. The purpose of this study was to determine visual and quantitative interobserver variability of [18F]FES PET and [18F]FDHT PET interpretation in patients with metastatic breast cancer.


In this prospective, two-centre study, patients with ER-positive metastatic breast cancer underwent both [18F]FES and [18F]FDHT PET/CT. In total, 120 lesions were identified in 10 patients with either conventional imaging (bone scan or lesions > 1 cm on high-resolution CT, n = 69) or only with [18F]FES and [18F]FDHT PET (n = 51). All lesions were scored visually and quantitatively by two independent observers. A visually PET-positive lesion was defined as uptake above background. For quantification, we used standardised uptake values (SUV): SUVmax, SUVpeak and SUVmean.


Visual analysis showed an absolute positive and negative interobserver agreement for [18F]FES PET of 84% and 83%, respectively (kappa = 0.67, 95% CI 0.48–0.87), and 49% and 74% for [18F]FDHT PET, respectively (kappa = 0.23, 95% CI − 0.04–0.49). Intraclass correlation coefficients (ICC) for quantification of SUVmax, SUVpeak and SUVmean were 0.98 (95% CI 0.96–0.98), 0.97 (95% CI 0.96–0.98) and 0.89 (95% CI 0.83–0.92) for [18F]FES, and 0.78 (95% CI 0.66–0.85), 0.76 (95% CI 0.63–0.84) and 0.75 (95% CI 0.62–0.84) for [18F]FDHT, respectively.


Visual and quantitative evaluation of [18F]FES PET showed high interobserver agreement. These results support the use of [18F]FES PET in clinical practice. In contrast, visual agreement for [18F]FDHT PET was relatively low due to low tumour-background ratios, but quantitative agreement was good. This underscores the relevance of quantitative analysis of [18F]FDHT PET in breast cancer.

Trial registration, NCT01988324. Registered 20 November 2013,


Breast cancer is the most common malignancy in women in the Western world. The majority of breast tumours express the oestrogen receptor (ER), which is the main indicator of potential response to anti-oestrogen therapies [1, 2]. Therefore, it is mandatory to determine ER expression in breast cancer. Recently, the androgen receptor (AR) emerged as a possible target for breast cancer therapy. The AR is present in 70–80% of patients with breast cancer, and AR antagonists are under investigation in clinical trials [3,4,5,6].

A tumour biopsy is the gold standard to determine receptor expression. However, this is an invasive procedure, is not always feasible in case of inaccessible tumour sites, and is subject to sampling errors [7]. The 16α-[18F]fluoro-17β-oestradiol ([18F]FES) and 16β-[18F]fluoro-5α-dihydrotestosterone ([18F]FDHT) PET/CT have been developed to non-invasively visualise, respectively, the ER and AR status in the tumour lesions within a patient. Previously, it has been shown that [18F]FES and [18F]FDHT uptake correlate well with ER and AR expression levels in representative breast cancer biopsies [8,9,10]. As a diagnostic tool, [18F]FES PET leads to better diagnostic understanding in 88% and to a change of therapy in 48% of the patients presenting with a clinical dilemma [11]. To predict treatment effects, [18F]FES PET can be used to assess residual ER availability during treatment with, e.g. fulvestrant, a selective ER downregulator. Inadequate reduction of the [18F]FES PET signal (< 75%) by fulvestrant treatment was associated with early progression [12]. Similarly, in patients with prostate cancer, [18F]FDHT PET was used to determine the optimal dose of the AR blocker enzalutamide in a phase 1 trial [13]. Lastly, patients with ER-positive breast cancer and high [18F]FDG uptake showed a worse progression free survival if [18F]FES uptake was low in comparison to high [18F]FES uptake (3 versus 8 months, respectively) [14].

For all these potential applications, reliable, observer-independent identification and quantification of [18F]FES and [18F]FDHT uptake in tumour lesions is essential for translation to daily clinical practice. Up till now, there are no data on the interobserver variability of [18F]FES and [18F]FDHT PET in breast cancer. Therefore, the primary objective of this study was to examine interobserver variability in visual and quantitative assessment of [18F]FES and [18F]FDHT PET. Secondary objectives included the effect of tumour to background ratio (TBR), tracer accumulation, tumour size and the use of different SUV parameters (SUVmax, SUVpeak or SUVmean) on interobserver agreement. Also, the added value of quantitative assessment in comparison to visual assessment was examined, and the number of lesions detected on [18F]FES and [18F]FDHT was compared with those detected on conventional imaging methods (contrast enhanced CT scan and bone scan).

Materials and methods

Patient population

This prospective two-centre interobserver variability study was part of a study investigating the correlation between [18F]FES and [18F]FDHT uptake and ER and AR expression in simultaneously biopsied metastases, of which the results have been published elsewhere [8]. Patients were recruited from September 2014 to August 2015 at the CCA-VUmc University Medical Center Amsterdam and the University Medical Center Groningen in the Netherlands.

Eligibility criteria included metastatic breast cancer and an ER-positive primary tumour, ≥ 1 extrahepatic tumour lesion, ECOG performance status of ≤ 2 and a postmenopausal status or use of LHRH-agonists. Patients were excluded if they had used ER or AR binding drugs during the 6 weeks before study entry, because these ligands compete with tracer binding.

All patients had to give written informed consent before study participation. The study was conducted in compliance with the ethical principles originating in or derived from the Declaration of Helsinki and in compliance with all International Conference on Harmonization Good Clinical Practice guidelines. The local medical ethics committee approved the study (NCT01988324).

Imaging protocols

[18F]FES and [18F]FDHT were produced as described previously [15, 16]. On separate days, ≤ 14 days apart, 200 MBq (± 10%) of each tracer was injected. After 60 min (± 5 min), a low-dose CT was performed during tidal breathing for attenuation correction, followed by a whole-body PET scan (skull vertex to mid-thigh, 2 min per bed position). PET/CT scans were made using a Philips Gemini TF-64 PET/CT (Amsterdam) or Siemens 64 slice mCT PET/CT (Groningen). Acquisition and reconstruction protocols used on both scanners were according to the recommendations of the European Association of Nuclear Medicine (EARL) [17].

In addition, a high-resolution, contrast-enhanced CT chest-abdomen and bone scan was performed within 6 weeks of the PET scans for comparison.

Image analyses

Contrast enhanced CT scans were examined by experienced radiologists and bone scans by experienced nuclear medicine physicians, respectively, masked for the [18F]FES and [18F]FDHT PET results. Two independent observers from each centre (LM and CV), trained and supervised by two experienced nuclear medicine physicians, performed visual and quantitative analyses. The observers had knowledge of conventional imaging results (contrast enhanced CT and bone scans).

A visually PET-positive lesion was defined as focal uptake above local background incompatible with physiological uptake. Liver metastases were excluded from all analyses in this study because of high physiological [18F]FES and [18F]FDHT uptake in healthy liver tissue, making reliable identification of metastases difficult. In addition, if visual interpretation of uptake in a (potential) lesion was impossible, e.g. due to overlap with adjacent organs with high physiological tracer, the readers independently reported it as ‘not evaluable’ in the visual ratings, and these were excluded from further analyses. For each patient, the observers made a list that consisted of all lesions already detected on conventional imaging, followed by additional lesions discovered on [18F]FES or [18F]FDHT PET. An anatomical description of all the lesions was reported in order to match the results. In case a lesion was not reported by one of the two observers, it was scored as not visible for that observer. All visually PET-positive lesions were quantified, as well as PET-negative lesions that were identified on conventional imaging (i.e. lesions on bone scintigraphy and/or high resolution CT > 1 cm).

Each observer manually drew volumes of interest (VOI) on the tumour contours, using PET images for PET-positive lesions and low-dose CT images for PET-negative lesions (lesions only seen on bone scan or high-resolution CT were visually matched on the low-dose CT). Lesions were separately analysed based on visibility on either PET or conventional imaging alone to investigate the influence of visibility on imaging techniques on interobserver agreement.

For every VOI, the standardised uptake values (SUV), i.e. the tracer uptake within a VOI normalised to the injected dose and body weight, were calculated using the software programs accurate (in-house build using IDL, observer 1) and syngo.via version VB10B, Siemens (observer 2). Both programs yielded identical results on test images. Three types of SUV were compared in this study: SUVmax (voxel with highest SUV within the VOI), SUVpeak (average SUV of a 1 cm3 sphere containing the hottest voxels of the VOI) and SUVmean with isocontour 50% of SUVmax (average SUV of all voxels with uptake ≥ 50% of SUVmax).

Based on previous studies, an SUVmax [18F]FES cut-off ≥ 1.5 was used to define ER-positivity (corresponding with a IHC cut-off of ≥ 1%) and an SUVmax [18F]FDHT cut-off ≥ 1.9 for AR positivity (corresponding with a IHC cut-off of ≥ 10%) [8, 9].

For [18F]FES and [18F]FDHT, the SUVmax tumour-background ratio (TBR) was defined as the ratio of the SUVmax of a tumour lesion and the SUVmean of healthy background tissue. To determine the SUVmean of healthy background tissue, a VOI was drawn on reference tissue in the unaffected contralateral site whenever available or in the unaffected surrounding tissue of the same origin [18].

Statistical analyses

For visual assessments, agreement was calculated with absolute and relative measures of interobserver agreement. Absolute agreement is the probability that if one observer would score a lesion as visible (positive agreement) or not visible (negative agreement) on the PET scan, the other observer would do the same. It is calculated by the following formulas: positive agreement = 2 × lesions visible to both observers/(2 × lesions visible to both observers + lesions only visible to observer 1 + lesions only visible to observer 2) and negative agreement = 2 × lesions not visible to both observers/(2 × lesions not visible to both observers + lesions only not visible to observer 1 + lesions only not visible to observer 2) [19]. In order to compare results with previous studies, also reliability (relative agreement) was calculated according to Cohen’s kappa, and the results were interpreted as follows: kappa 0.01–0.20 as slight, 0.21–0.40 as fair, 0.41–0.60 as moderate, 0.61–0.80 as substantial and 0.81–1.00 as almost perfect interobserver agreement [20]. To account for potential within-person correlation in visual assessments, a chi-square test was performed to examine whether the percentage visual agreement differed per patient.

For quantitative assessments, parameters are presented as mean ± SD, and reliability was calculated with intraclass correlation coefficients (ICC) using a two-way random effect model with absolute agreement. For the interpretation of the ICCs, the following guideline was used: ≥ 0.90 as excellent, ≥ 0.75 as good, ≥ 0.50 as moderate and < 0.50 as poor [21].

Absolute agreement on quantitative assessments were analysed with Bland-Altman plots (differences between observers showed a normal distribution). For each lesion, it graphically shows the average SUV of observers 1 and 2 on the x-axes and on the y-axes the difference between observers for each lesion, expressed as percentage of the average SUV value. Percentage differences were used instead of absolute differences to achieve independence of magnitude of differences from magnitude of SUV values, and it facilitates comparisons between the SUV parameters SUVmax, SUVmean and SUVpeak, which may show large differences in absolute values.

To investigate the effect of TBRs on interobserver variability, differences between TBRs of [18F]FES and [18F]FDHT PETs were tested with Wilcoxon matched pairs signed rank tests. In addition, correlations between tracer uptake or tumour size and percentage interobserver differences were determined using the Spearman correlation coefficient (r). Finally, linear regression was performed to find the linear function between SUVmax, SUVpeak and SUVmean for [18F]FES and [18F]FDHT PET, and Cochran’s Q and McNemar tests were used to analyse differences between visibility and quantitative uptake above or below cut-off for SUVmax, SUVpeak and SUVmean. P value < 0.05 was considered significant. Statistical analyses were generated using the SPSS software (version 22; IBM, SPSS statistics).


Patient characteristics

A total of 120 lesions were identified in 10 patients using the different imaging modalities (Table 1). Most lesions were skeletal (66%), followed by lymph node (25%) and visceral metastases (9%). The median number of lesions per patient was 9 (range 2–32).

Table 1 Patient characteristics

Comparison of lesion detection on different imaging modalities

Of the 120 lesions in total (Table 1), most were identified on [18F]FES PET (n = 64 [53%] and n = 69 [58%] by observer 1 and 2, respectively), followed by high-resolution CT (n = 54 [45%]), bone scintigraphy (n = 40 [33%]) and [18F]FDHT PET (n = 36 [30%] and n = 37 [31%]). Fifty and 42% of the lesions identified on [18F]FES PET by observer 1 and 2, respectively, were also detected on high resolution CT or bone scintigraphy (Fig. 1). For [18F]FDHT PET, 55% and 49% of the identified lesions were seen with conventional imaging. Conversely, 46 and 42% of the lesions identified on conventional imaging were visible on [18F]FES PET by, respectively, observer 1 and 2, and 29% and 26% were seen on [18F]FDHT PET. In particular, more lymph node lesions were detected on [18F]FES PET and [18F]FDHT PET compared to conventional imaging: 97% and 53% versus 27% of all detected lymph node lesions, respectively.

Fig. 1

Tumour lesions detected with conventional imaging, [18F]FES and [18F]FDHT PET

Visual analysis of [18F]FES and [18F]FDHT PET images

Out of 120 lesions, a total of 87 and 74 on [18F]FES and [18F]FDHT PET, respectively, were analysed for visual interobserver agreement. The other lesions were excluded because one or both observers reported these as ‘not evaluable’ due to overlap with adjacent organs with high physiological tracer uptake.

For lesions visible on conventional imaging, [18F]FES PET readings (Table 2) had substantial positive and negative agreement of 84% (95% CI 72–92%) and 83% (95% CI 70–91%), respectively (kappa = 0.67, 95% CI 0.48–0.87). By including lesions that were only visible on [18F]FES PET, the positive agreement improved to 88% (95% CI 80–93%) for all lesions scored on [18F]FES PET (negative agreement remained the same). [18F]FDHT PET showed lower positive agreement of 49% (95% CI 32–65%) for lesions visible on conventional imaging, while negative agreement was 74% (95% CI 62–83%) (kappa = 0.23, 95% CI − 0.04–0.49). Positive agreement for all lesions scored on [18F]FDHT PET was 58% (95% CI 43–71%). By looking at lesions only visible on PET and not on conventional imaging, the positive agreement rate was the highest: 91% (95% CI 81–96%) for [18F]FES PET and 80% (95% CI 55–93%) for [18F]FDHT PET. Visual interobserver agreement was not significantly different between the 10 different patients in this study: P = 0.159 for [18F]FES PET and P = 0.387 for [18F]FDHT PET.

Table 2 Visual interobserver agreement for lesions visible (A, C) and not visible on conventional imaging (B, D) on [18F]FES and [18F]FDHT PET, respectively

An important aspect in the identification of tumour lesions is how well tracer uptake can be distinguished from background uptake in normal reference tissue. The TBR of [18F]FDHT was significantly lower than that of [18F]FES (Fig. 2). In bone lesions, the mean TBR of [18F]FDHT was 2.0 (± SD 0.6) versus 3.3 (± SD 2.2) for [18F]FES (P = 0.003). In addition, in lymph node lesions, the mean [18F]FDHT TBR was 4.6 (± SD 1.9) compared to 10.7 (± SD 8.4) for [18F]FES (P < 0.0001).

Fig. 2

The difference in tumour-background ratio between [18F]FES and [18F]FDHT PET shown visually (a) and quantitatively (mean ± SD) for bone and lymph node lesions (b). The arrows in a show a bone lesion in the right os ilium visible on [18F]FES PET which is only subtly visible on [18F]FDHT PET. Note, there is physiological tracer uptake of [18F]FES in the liver, gallbladder, intestine, bladder and for [18F]FDHT also in the bloodpool

Quantitative analyses of [18]FES and [18F]FDHT PET images

Out of 120 lesions, a total of 94 and 95 were quantified by both observers on [18F]FES and [18F]FDHT PET, respectively. The other lesions were not quantified by one or both of the observers as a result of overlap with adjacent organs with high physiological tracer uptake, unless there was a clear anatomical substrate on other imaging modalities allowing for reliable VOI definition.

In general, interobserver agreement was excellent for PET quantification (Fig. 3) of all lesions combined (i.e. visible on PET or seen on conventional imaging). The ICCs for quantification of SUVmax, SUVpeak and SUVmean on [18F]FES PET were 0.98 (95% CI 0.96–0.98), 0.97 (95% CI 0.96–0.98) and 0.89 (95% CI 0.83–0.92). For [18F]FDHT PET, the ICCs were lower with 0.78 (95% CI 0.66–0.85), 0.76 (95% CI 0.63–0.84) and 0.75 (95% CI 0.62–0.84), respectively.

Fig. 3

Intraclass correlation coefficients for all quantified tumour lesions on [18F]FES (n = 94) using SUVmax, SUVpeak and SUVmean (a, b and c) and [18F]FDHT PET (n = 95) (d, e and f). Note: not quantifiable lesions by one or both of the observers were excluded as a result of overlap with adjacent organs with high physiological tracer uptake

In addition, [18F]FES (Fig. 4) and [18F]FDHT PET (Fig. 5) quantification was analysed separately with Bland Altman plots for all lesions visible on PET or lesions only visible on conventional imaging (hence, PET-negative lesions). For [18F]FES PET, PET-positive lesions showed excellent quantitative interobserver agreement with mean differences < 2% and 95% limits of agreement (LOA95%) being narrower for SUVmax (LOA95% − 31.3 to 34.3%) and SUVpeak (LOA95% − 31.1 to 28.4%), compared to SUVmean (LOA95% − 46.5 to 44.3%). More differences were shown for PET-negative lesions with mean interobserver differences < 14% and larger LOA95% (within ± 75%), but note that absolute differences between observers were generally low due to a low SUV. Similarly, for [18F]FDHT PET, interobserver agreement was better for PET-positive (mean interobserver differences < 7%, LOA95% within ± 45 %) compared to PET-negative lesions (mean interobserver differences < 12%, LOA95% within ± 76%). SUVmax and SUVpeak showed a better interobserver agreement in comparison to SUVmean for the quantification of lesions visible on [18F]FES PET, while on [18F]FDHT PET the different SUV parameters were comparable.

Fig. 4

Bland Altman plots showing the % differences in SUVmax, SUVpeak and SUVmean between observers for lesions visible on [18F]FES PET (a, b, c) or only visible on conventional imaging (d, e, f). The dashed lines represent the mean difference between observers ± 95% limits of agreement (LOA95%)

Fig. 5

Bland Altman plots showing the % differences in SUVmax, SUVpeak and SUVmean between observers for lesions visible on [18F]FDHT PET (a, b, c) or only visible on conventional imaging (d, e, f)

Higher levels of tracer accumulation in PET positive lesions were not associated with improved interobserver agreement (for [18F]FES PET: Spearman r = 0.04, 0.26 and 0.14 for SUVmax, SUVpeak and SUVmean, respectively and for [18F]FDHT PET: Spearman r = 0.00, r = 0.03 and r = − 0.17, respectively). In addition, there was no correlation between tumour size and interobserver agreement (for [18F]FES PET: Spearman r = 0.10, r = 0.08 and r = 0.06, for SUVmax, SUVpeak and SUVmean, respectively and for [18F]FDHT PET: Spearman r = − 0.07, r = − 0.16 and r = − 0.42, respectively).

The added value of quantitative assessment in comparison to visual assessment

Based on previous studies, [18F]FES and [18F]FDHT SUVmax cut-off levels of 1.5 and 1.9, respectively, have been identified. There are however limited data on quantitative thresholds and corresponding cut-off values for SUVpeak and SUVmean. Based on linear regression of all lesions quantified in this study, an SUVmax cut-off of 1.5 on [18F]FES PET corresponded with an SUVpeak of 1.2 and an SUVmean of 1.1 (Supplementary figure S1), and for [18F]FDHT PET, an SUVmax cut-off of 1.9 corresponded with an SUVpeak of 1.6 and an SUVmean of 1.3.

For diagnostic purposes, it is important to identify all receptor positive tumour lesions. Therefore, we compared visual and quantitative tracer uptake above/below cut-off levels (Table 3). In 3% and 1% of the lesions scored visually positive on [18F]FES PET by observer 1 and 2 respectively, SUVmax was below the threshold of 1.5. For [18F]FDHT PET, 14% of the visually positive lesions scored by observer 1 as well as observer 2 had an SUVmax below the threshold of 1.9. There were no structural differences between observer 1 and 2. The discrepancies were mostly seen in lesions located in tissue with low background uptake such as skin and lung metastases (Supplementary table S1). Conversely, in 44% and 39% of the lesions scored visually negative on [18F]FES PET by observer 1 and 2, respectively, SUVmax was ≥ 1.5. Similarly, 31% and 52% of the visually negative lesions had an SUVmax ≥ 1.9 on [18F]FDHT PET, respectively. However, in most cases (60%), we observed overlap with organs having high physiological tracer accumulation such as the liver and bowel, followed by lesions that were determined to be visually positive at second glance (32%). After correction for these effects, ≤ 4% of the visually negative lesions had a SUVmax above cut-off for both tracers.

Table 3 Discrepancies between visual and quantitative assessments (above/below cut-off values for receptor positivity) for [18F]FES (A) and [18F]FDHT PET (B)

Comparing the impact of the different SUV parameters on discrepancies between visual and quantitative assessments showed no significant differences with the only exception that SUVmean showed less visually negative lesions above cut-off on [18FES]PET than SUVmax or SUVpeak for observer 1 (P = 0.008 and P = 0.001, respectively), but not for observer 2 (P = 0.125 and P = 0.063, respectively).


Interobserver variability is an important step in the clinical application of diagnostic tools. Here, we showed that both visual and quantitative evaluation were highly reproducible between independent observers evaluating [18F]FES PET at separate centres using different scanners and software. Visual positive and negative absolute agreement was > 80%, with a kappa of 0.67. Also, the interobserver reliability of quantitative metrics was excellent for SUVmax and SUVpeak (ICC of 0.98 and 0.97, respectively) and good for SUVmean (ICC of 0.89). In comparison, staging patients with breast cancer showed similar results for bone scintigraphy (kappa 0.62–0.78) and [18F]FDG PET (kappa 0.65 and an ICC of 0.93 for the quantification of [18F]FDG uptake) [22,23,24,25,26].

[18F]FDHT PET also showed good interobserver reliability for quantitative assessments with ICCs ≥ 0.75. These values are slightly lower than those of [18F]FES PET, and this was probably due to the lower lesional [18F]FDHT uptake, because quantitative agreement according to Bland Altman analyses were comparable for both tracers. The TBR of [18F]FDHT was considerably lower compared to [18F]FES. This probably explains the higher variability in visual interpretation (kappa = 0.23), mainly caused by a low visual positive agreement (49%) in lesions already identified by conventional imaging modalities, while positive agreement in lesions not identified by conventional imaging was much higher (80%), as well as negative visual agreement between observers (74%). An important impeding factor was the significantly lower TBR of [18F]FDHT in bone and lymph node lesions compared to [18F]FES PET. The TBR of [18F]FDHT in the current study (2.0 for bone and 4.6 for lymph nodes) was also lower than in prostate cancer metastases (3.3 for bone and 5.7 for soft tissue metastases) with an SUVmax three times higher in prostate cancer (7.1–9.1 versus 2.0 in the present breast cancer study) [27, 28]. This suggests that higher AR expression likely results in better interobserver reliability.

Our study had some limitations. There were only a limited number of patients included in this study. However, receptor expression between lesions within a single patient can be heterogeneous [29], which was confirmed in the present study resulting in the coverage of a large range of data in 120 lesions [8]. In addition, we showed there was no within-patient correlation in visual assessments. A second limitation is a substantial number of ‘not evaluable’ lesions, due to overlap with adjacent organs with high physiological background. The decision for evaluability was left to each observer individually, which may have contributed to the low agreement (≤ 6%) on these ‘not evaluable lesions’. For future studies, we recommend that all lesions with physiological background overlap from the liver, gallbladder, intestine, bladder and for [18F]FDHT also from bloodpool are regarded as not evaluable. A third limitation is the lack of robust [18F]FES and [18F]FDHT thresholds for test positivity. We used an SUVmax cut-off of 1.5 for [18F]FES and 1.9 for [18F]FDHT PET based on previous data corresponding with ER and AR positivity in biopsies and so far showing the best predictive value for response to endocrine therapy [8, 9, 30, 31]. Some studies suggested an SUVmax cut-off of 2.0 for [18F]FES PET, taking into account the background [18F]FES uptake in normal tissues which can exceed the cut-off of 1.5 [29,30,31]. Tissue specific cut-off values may indeed be more appropriate as there are responders to endocrine therapy with a tumour SUVmax < 2.0. In the current study, up to 20% of the visually positive lesions had an SUVmax < 2.0, while < 3% had an SUVmax < 1.5 (Supplementary table S2).

For diagnostic purposes, simple visual assessment of [18F]FES uptake may suffice to determine the receptor status of a tumour lesion (agreement was high between visual assessment and the applied SUVmax cut-off value of 1.5 for ER-positivity). True discrepancies between visibility and corresponding uptake above or below cut-off were low (< 4%), making quantification of visually negative lesions not only cumbersome, but also unnecessary. Also, quantification of lesions without visual [18F]FES uptake leads to higher interobserver variability due to differences in VOI definition. However, quantification remains a helpful tool for nuclear medicine physicians in ‘equivocal [18F]FES lesions’. In addition, quantification is useful to measure receptor availability over time for the evaluation of treatment effects. In contrast, quantification of [18F]FDHT uptake is still required in future breast cancer studies, as we have shown relatively low visual agreement.

The role of [18F]FES and [18F]FDHT PET in addition to conventional imaging modalities needs to be defined further. It has to be taken into account that besides partial volume effects and constraints due to background tracer uptake limiting their detection, receptor expression can be heterogeneous and variable during the course of the disease [11, 32]. In addition, treatment may induce changes in receptor expression, but also eradicated tumour cells can leave a visible lesion on conventional imaging (e.g. sclerotic bone lesions), in absence of viable tumour cells. In the current study with heavily pretreated patients, 42–46% and 26–29% of the lesions identified by conventional imaging were detected on [18F]FES and [18F]FDHT PET, respectively. Vice versa, only approximately 50% of the lesions observed on [18F]FES PET and [18F]FDHT PET were identified by conventional imaging.

Therefore, a potential role for [18F]FES PET may be in staging of early ER-positive breast cancer as an addition to existing imaging techniques. Standard staging with [18F]FDG PET can miss low-intermediate grade ER-positive lesions due to their low metabolic activity [33]. We are currently investigating [18F]FES PET in staging patients with low grade, ER-positive locally advanced or recurrent breast cancer versus [18F]FDG PET (NCT03726931), and in metastatic breast cancer versus addition to conventional diagnostics (NCT01957332). The non-invasive visualisation of receptor status in metastatic lesions with PET offers a number potential clinical advantages. For example, in case conventional diagnostics cannot establish a final diagnosis of suspected metastatic breast cancer lesions (e.g. as a result of inaccessible biopsy sites or repeated biopsy sampling errors). Also, PET imaging may help to determine the hormone receptor status of different tumour sites within a patient and guide treatment decisions, for instance, to decide on the origin of a metastatic lesion in case of multiple primary tumours or to determine whether receptor conversion occurred in metastases from a single primary tumour [11]. If validated, this may help with multimodality treatment strategies for heterogeneous tumour sites of breast cancer, such as endocrine therapy for [18F]FES positive lesions combined with a local modality such as radiotherapy for concurrent [18F]FES negative lesions [34].


In conclusion, our findings demonstrate that visual and quantitative evaluation of [18F]FES PET has a high interobserver concordance and support the use in clinical practice. Although [18F]FDHT PET showed relatively low visual agreement, presumably a result of the low AR expression and consequently low TBR in patients with breast cancer, there was good quantitative agreement between observers, acceptable for further [18F]FDHT PET imaging studies in breast cancer.

Availability of data and materials

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.



Oestrogen receptor


Androgen receptor






Volume of interest


Standardised uptake value


Tumour-background ratio


Intraclass correlation coefficient

LOA95% :

95% limits of agreement


  1. 1.

    Blamey RW, Hornmark-Stenstam B, Ball G, Blichert-Toft M, Cataliotti L, Fourquet A, et al. ONCOPOOL - a European database for 16,944 cases of breast cancer. Eur J Cancer. 2010;46:56–71.

    CAS  Article  PubMed  Google Scholar 

  2. 2.

    Yamashita H, Yando Y, Nishio M, Zhang Z, Hamaguchi M, Mita K, et al. Immunohistochemical evaluation of hormone receptor status for predicting response to endocrine therapy in metastatic breast cancer. Breast Cancer. 2006;13:74–83.

    Article  Google Scholar 

  3. 3.

    Collins LC, Cole KS, Marotti JD, Hu R, Schnitt SJ, Tamimi RM. Androgen receptor expression in breast cancer in relation to molecular phenotype: results from the Nurses’ Health Study. Mod Pathol. 2011;24:924–31.

    Article  PubMed  PubMed Central  Google Scholar 

  4. 4.

    Krop I, Colleoni M, Traina T, Holmes F, Estevez L, et al. Results from a randomized placebo-controlled phase 2 trial evaluating exemestane ± enzalutamide in patients with hormone receptor–positive breast cancer. Abstract GS4-07. San Antonio Breast Cancer Symposium. San Antonio, Texas; 2017.

  5. 5.

    Gucalp A, Tolaney S, Isakoff SJ, Ingle JN, Liu MC, Carey LA, et al. Phase II trial of bicalutamide in patients with androgen receptor-positive, estrogen receptor-negative metastatic breast cancer. Clin Cancer Res. 2013;19:5505–12.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  6. 6.

    Traina TA, Yardley DA, Schwartzberg LS, O'Shaughnessy J, Cortes J, Awada A, et al. Overall survival (OS) in patients (Pts) with diagnostic positive (Dx+) breast cancer: subgroup analysis from a phase 2 study of enzalutamide (ENZA), an androgen receptor (AR) inhibitor, in AR+ triple-negative breast cancer (TNBC) treated with 0-1 prior lines of therapy. J Clin Oncol. 2017;35:1089.

    Article  Google Scholar 

  7. 7.

    Youk JH, Kim EK, Kim MJ, Lee JY, Oh KK. Missed breast cancers at US-guided core needle biopsy: how to reduce them. Radiographics. 2007;27:79–94.

    Article  PubMed  Google Scholar 

  8. 8.

    Venema CM, Mammatas LH, Schroder CP, van Kruchten M, Apollonio G, Glaudemans A, et al. Androgen and estrogen receptor imaging in metastatic breast cancer patients as a surrogate for tissue biopsies. J Nucl Med. 2017;58:1906–12.

    CAS  Article  PubMed  Google Scholar 

  9. 9.

    van Kruchten M, de Vries EG, Brown M, de Vries EF, Glaudemans AW, Dierckx RA, et al. PET imaging of oestrogen receptors in patients with breast cancer. Lancet Oncol. 2013;14:e465–75.

    CAS  Article  PubMed  Google Scholar 

  10. 10.

    Chae SY, Ahn SH, Kim SB, Han S, Lee SH, Oh SJ, et al. Diagnostic accuracy and safety of 16alpha-[(18)F]fluoro-17beta-oestradiol PET-CT for the assessment of oestrogen receptor status in recurrent or metastatic lesions in patients with breast cancer: a prospective cohort study. Lancet Oncol. 2019;20:546–55.

    CAS  Article  PubMed  Google Scholar 

  11. 11.

    van Kruchten M, Glaudemans AW, de Vries EF, Beets-Tan RG, Schroder CP, Dierckx RA, et al. PET imaging of estrogen receptors as a diagnostic tool for breast cancer patients presenting with a clinical dilemma. J Nucl Med. 2012;53:182–90.

    CAS  Article  PubMed  Google Scholar 

  12. 12.

    van Kruchten M, de Vries EG, Glaudemans AW, van Lanschot MC, van Faassen M, Kema IP, et al. Measuring residual estrogen receptor availability during fulvestrant therapy in patients with metastatic breast cancer. Cancer Discov. 2015;5:72–81.

    CAS  Article  PubMed  Google Scholar 

  13. 13.

    Scher HI, Beer TM, Higano CS, Anand A, Taplin ME, Efstathiou E, et al. Antitumour activity of MDV3100 in castration-resistant prostate cancer: a phase 1-2 study. Lancet. 2010;375:1437–46.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  14. 14.

    Kurland BF, Peterson LM, Lee JH, Schubert EK, Currin ER, Link JM, et al. Estrogen receptor binding (18F-FES PET) and glycolytic activity (18F-FDG PET) predict progression-free survival on endocrine therapy in patients with ER+ breast cancer. Clin Cancer Res. 2017;23:407–15.

    CAS  Article  PubMed  Google Scholar 

  15. 15.

    Liu A, Dence CS, Welch MJ, Katzenellenbogen JA. Fluorine-18-labeled androgens: radiochemical synthesis and tissue distribution studies on six fluorine-substituted androgens, potential imaging agents for prostatic cancer. J Nucl Med. 1992;33:724–34.

    CAS  PubMed  Google Scholar 

  16. 16.

    Römer J, Steinbach J, Kasch H. Studies on the synthesis of 16α-[18F] fluoroestradiol. Appl Rad Isotop. 1996;47:395–9.

    Article  Google Scholar 

  17. 17.

    Boellaard R, Delgado-Bolton R, Oyen WJ, Giammarile F, Tatsch K, Eschner W, et al. FDG PET/CT: EANM procedure guidelines for tumour imaging: version 2.0. Eur J Nucl Med Mol Imaging. 2015;42:328–54.

    CAS  Article  PubMed  Google Scholar 

  18. 18.

    Jansen BHE, Kramer GM, Cysouw MCF, Yaqub MM, de Keizer B, Lavalaye J, et al. Healthy tissue uptake of (68)Ga-prostate specific membrane antigen (PSMA), (18)F-DCFPyL, (18)F-fluoromethylcholine (FCH) and (18)F-dihydrotestosterone (FDHT). J Nucl Med. 2019.

  19. 19.

    de Vet HC, Mokkink LB, Terwee CB, Hoekstra OS, Knol DL. Clinicians are right not to like Cohen’s kappa. BMJ. 2013;346:f2125.

    Article  PubMed  Google Scholar 

  20. 20.

    Landis JR, Koch GG. An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics. 1977:363–74.

  21. 21.

    Portney LG, Watkins MP. Foundations of clinical research: applications to practice: Pearson/Prentice Hall Upper Saddle River, NJ; 2009.

  22. 22.

    Sawicki LM, Grueneisen J, Schaarschmidt BM, Buchbender C, Nagarajah J, Umutlu L, et al. Evaluation of 18 F-FDG PET/MRI, 18 F-FDG PET/CT, MRI, and CT in whole-body staging of recurrent breast cancer. Eur J Radiol. 2016;85:459–65.

    Article  Google Scholar 

  23. 23.

    van der Hoeven JJ, Hoekstra OS, Comans EF, Pijpers R, Boom RP, van Geldere D, et al. Determinants of diagnostic performance of [F-18] fluorodeoxyglucose positron emission tomography for axillary staging in breast cancer. Ann Surg. 2002;236:619.

    Article  Google Scholar 

  24. 24.

    Shackleton M, Yuen K, Little AF, Schlicht S, McLachlan SA. Reliability of X-rays and bone scans for the assessment of changes in skeletal metastases from breast cancer. Intern Med J. 2004;34:615–20.

    CAS  Article  PubMed  Google Scholar 

  25. 25.

    Gutzeit A, Doert A, Froehlich JM, Eckhardt BP, Meili A, Scherr P, et al. Comparison of diffusion-weighted whole body MRI and skeletal scintigraphy for the detection of bone metastases in patients with prostate or breast carcinoma. Skeletal Radiol. 2010;39:333–43.

    Article  PubMed  Google Scholar 

  26. 26.

    Jacene HA, Leboulleux S, Baba S, Chatzifotiadis D, Goudarzi B, Teytelbaum O, et al. Assessment of interobserver reproducibility in quantitative 18F-FDG PET and CT measurements of tumor response to therapy. J Nucl Med. 2009;50:1760–9.

    Article  PubMed  Google Scholar 

  27. 27.

    Fox JJ, Autran-Blanc E, Morris MJ, Gavane S, Nehmeh S, Van Nuffel A, et al. Practical approach for comparative analysis of multilesion molecular imaging using a semiautomated program for PET/CT. J Nucl Med. 2011;52:1727–32.

    Article  PubMed  PubMed Central  Google Scholar 

  28. 28.

    Vargas HA, Wassberg C, Fox JJ, Wibmer A, Goldman DA, Kuk D, et al. Bone metastases in castration-resistant prostate cancer: associations between morphologic CT patterns, glycolytic activity, and androgen receptor expression on PET and overall survival. Radiology. 2014;271:220–9.

    Article  PubMed  Google Scholar 

  29. 29.

    Nienhuis HH, van Kruchten M, Elias SG, Glaudemans A, de Vries EFJ, Bongaerts AHH, et al. (18)F-fluoroestradiol tumor uptake is heterogeneous and influenced by site of metastasis in breast cancer patients. J Nucl Med. 2018;59:1212–8.

    CAS  Article  PubMed  Google Scholar 

  30. 30.

    Dehdashti F, Mortimer JE, Trinkaus K, Naughton MJ, Ellis M, Katzenellenbogen JA, et al. PET-based estradiol challenge as a predictive biomarker of response to endocrine therapy in women with estrogen-receptor-positive breast cancer. Breast Cancer Res Treat. 2009;113:509–17.

    CAS  Article  PubMed  Google Scholar 

  31. 31.

    Mortimer JE, Dehdashti F, Siegel BA, Trinkaus K, Katzenellenbogen JA, Welch MJ. Metabolic flare: indicator of hormone responsiveness in advanced breast cancer. J Clin Oncol. 2001;19:2797–803.

    CAS  Article  PubMed  Google Scholar 

  32. 32.

    Amir E, Miller N, Geddie W, Freedman O, Kassam F, Simmons C, et al. Prospective study evaluating the impact of tissue confirmation of metastatic disease in patients with breast cancer. J Clin Oncol. 2012;30:587–92.

    Article  PubMed  Google Scholar 

  33. 33.

    Groheux D, Giacchetti S, Moretti JL, Porcher R, Espie M, Lehmann-Che J, et al. Correlation of high 18F-FDG uptake to clinical, pathological and biological prognostic factors in breast cancer. Eur J Nucl Med Mol Imaging. 2011;38:426–35.

    Article  PubMed  Google Scholar 

  34. 34.

    Mammatas LH, Verheul HM, Hendrikse NH, Yaqub M, Lammertsma AA, Menke-van der Houven van Oordt CW. Molecular imaging of targeted therapies with positron emission tomography: the visualization of personalized cancer care. Cell Oncol (Dordr). 2015;38:49–64.

    Article  Google Scholar 

Download references


We thank the patients who participated in this study and their families. In addition, we acknowledge the efforts of the clinical and imaging teams at the university medical centres participating in this study.


This study was financially supported by the Centre for Translational Molecular Medicine (CTMM) project as part of the Mammary Carcinoma Molecular Imaging for Diagnosis and Therapeutics (MAMMOTH) project.

Author information




CS, MvK, AG, HV, EB, ErdV, EldV, OH, GH and WM were responsible for the study concept and design. LM, CV, OH and AG carried out the data acquisition. MY, OH and AG performed the quality control of the data. HdV supervised the statistical analyses. BvdV performed the pathological analyses. LM and CV prepared the manuscript. CS, MvK, AG, HV, EB, ErdV, EldV, OH, GH, WM, MY, HdV and BvdV edited and revised the manuscript. All authors read and approved of the final manuscript.

Corresponding author

Correspondence to C. Willemien Menke-van der Houven van Oordt.

Ethics declarations

Ethics approval and consent to participate

All procedures performed in studies involving human participants were in accordance with the ethical standards of the Institutional Review Board of University Medical Center, Groningen, and also approved by the Institutional Review Board of the Amsterdam UMC, location VUmc University Medical Center, Amsterdam, the Netherlands (File no. 2014.501-NL41954.042.13) and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards. Informed consent was obtained from all individual participants included in the study.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Lemonitsa H. Mammatas and Clasina M. Venema are co-leading authors.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Mammatas, L.H., Venema, C.M., Schröder, C.P. et al. Visual and quantitative evaluation of [18F]FES and [18F]FDHT PET in patients with metastatic breast cancer: an interobserver variability study. EJNMMI Res 10, 40 (2020).

Download citation


  • Breast cancer
  • Oestrogen receptor
  • Androgen receptor
  • Interobserver variability