Impact of the EARL harmonization program on automatic delineation of metabolic active tumour volumes (MATVs)

Background The clinical validation of the EARL harmonization program for standardised uptake value (SUV) metrics is well documented; however, its potential for defining metabolic active tumour volume (MATV) has not yet been investigated. We aimed to compare delineation of MATV on images reconstructed using conventional ordered subset expectation maximisation (OSEM) with those reconstructed using point spread function modelling (PSF-reconstructed images), and either optimised for diagnostic potential (PSF) or filtered to meet the EANM/EARL harmonising standards (PSF7). Methods Images from 18 stage IIIA-IIIB lung cancer patients were reconstructed using all the three methods. MATVs were then delineated using both a 40% isocontour and a gradient-based method. MATVs were compared by means of Bland–Altman analyses, and Dice coefficients and concordance indices based on the unions and intersections between each pair of reconstructions (PSF vs OSEM, PSF7 vs PSF and PSF7 vs OSEM). Results Using the 40% isocontour method and taking the MATVs delineated on OSEM images as a reference standard, the use of PSF7 images led to significantly higher Dice coefficients (median value = 0.96 vs 0.77; P < 0.0001) and concordance indices (median value = 0.92 vs 0.64; P < 0.0001) than those obtained using PSF images. The gradient-based methodology was less sensitive to reconstruction variability than the 40% isocontour method; Dice coefficients and concordance indices were superior to 0.8 for both PSF reconstruction methods. However, the use of PSF7 images led to narrower interquartile ranges and significantly higher Dice coefficients (median value = 0.96 vs 0.94; P = 0.01) and concordance indices (median value = 0.89 vs 0.85; P = 0.003) than those obtained with PSF images. Conclusion This study demonstrates that automatic contouring of lung tumours on EARL-compliant PSF images using the widely adopted automatic isocontour methodology is an accurate means of overcoming reconstruction variability in MATV delineation. Although gradient-based methodology appears to be less sensitive to reconstruction variability, the use of EARL-compliant PSF images significantly improved the Dice coefficients and concordance indices, demonstrating the importance of harmonised-images, even when more advanced contouring algorithms are used. Electronic supplementary material The online version of this article (doi:10.1186/s13550-017-0279-y) contains supplementary material, which is available to authorized users.


Background
Although standard metrics such as standardised uptake values (either SUV max or SUV peak ) are widely used as prognostic tools or for monitoring of therapy in cancer treatment [1], metabolically active tumour volume (MATV) has recently been receiving a lot of interest as a pretreatment prognostic tool for various types of cancer [2][3][4][5]. Delineation of MATV is also useful for radiotherapy planning in various types of cancer, including non-small cell lung cancer (NSCLC) [6]. This growing interest in MATV is illustrated in Fig. 1, which shows the number of articles using MATV published over the past 10 years. The impact of PET imaging parameters and automatic tumour delineation in radiotherapy planning has been well documented [7][8][9] and has indicated a requirement for improved delineation methodologies. Recent studies in non-Hodgkin lymphoma (NHL) have shown high MATV to be predictive of overall survival [10]; although, widely disparate cut-off values have been used, which have fuelled the ongoing discussion on the need to standardise the quality of PET images and delineation methods.
Harmonization programs, such as the EANM/EARL (European Association of Nuclear Medicine/EANM Research Ltd) accreditation program [11], are designed to harmonise data acquisition, processing, and analysis to facilitate comparisons of PET quantitative values within multicentre trials, or in sites equipped with multiple PET/ CT scanners, regardless of the PET/CT system used. Given that centres running PET systems with advanced reconstruction algorithms often wish to use them with parameters chosen to achieve optimal lesion detection, EARL-accredited centres tend to use two-PET datasets: one for optimal lesion detection and image interpretation and another filtered for harmonised quantification [12].
The EARL program has been well validated for standard SUV metrics [12][13][14][15], but clinical validation of this harmonization program for MATV delineation is still lacking. This study examined MATVs delineated in stage IIIA-IIIB lung cancer patients, with the aim of comparing MATVs in PSF-reconstructed images optimised for diagnosis (PSF), PSF-reconstructed images with a filter chosen to meet harmonising standards (PSF 7 ), and EARL-compliant images reconstructed using ordered subset expectation maximisation (OSEM). Stage III NSCLCs were chosen, as these stages are typically treated by radiation therapy or radio-chemotherapy, and many centres use FDG PET MATV delineation to optimise tumour targeting. MATVs were compared not only in terms of absolute and relative values but also using a concordance measure, which gives a representative geometrical description of changes in MATV, combining both volume and positional differences [16].

Patient selection
Eighteen consecutive biopsy-proven stage IIIA-IIIB lung cancer patients who had been scanned for staging purposes were included in this retrospective study. This study was approved by the local ethics committee (Ref A12-D24-VOL13, Comité de protection des personnes Nord-Ouest III), and the requirement for informed consent was waived.

PET/CT examinations
Patients who had fasted for 6 h previous to the examination were injected with 18 F-FDG after 15-min of rest in a warm room (mean injected dose ±SD = 3.89 ± 0.44 MBq/Kg). All PET imaging studies were performed 60 ± 5 min post injection, on a Biograph TrueV system (Siemens Medical Solutions, Erlangen, Germany), with a 6-slice spiral CT component, according to the EANM guidelines [17].
A free-breathing CT acquisition was performed first, using the following parameters: 60 mAs, 130 kVp, pitch 1, and 6 × 2-mm collimation. The PET emission acquisition was then subsequently performed in a 3-D mode. Patients were scanned from the skull base to the midthighs, with time per bed acquisitions of 160 and 220 s for normal weight (BMI ≤25 kg/m 2 ) and overweight patients (BMI >25 kg/m 2 ), respectively.

PET reconstruction
The Biograph TrueV system is equipped with PSF reconstruction (HD; TrueX, Siemens Medical Solutions) but has no time of flight capability.
The standard reconstruction used in our department was a PSF reconstruction algorithm (HD; TrueX, Siemens Medical Solutions; 3 iterations and 21 subsets) without filtering. We did not use any post filtering as modelling the PSF during the iterative reconstruction introduces correlations between neighbouring voxels in a Fig. 1 Numbers of articles related to MATV as a function of the year of publication. Publications were identified using a MEDLINE search with the following enquiry: ("MATV" or "MTV") and "PET". Only human studies were included manner similar to smoothing filters and thus has been shown to achieve maximal performance with little to no filtering [18]. Raw data were also reconstructed with an OSEM reconstruction algorithm (4 iterations and 8 subsets), and a PSF reconstruction algorithm (3 iterations and 21 subsets) incorporating a 7-mm Gaussian filter (PSF 7 ). As shown in a previous study [12], this latter reconstruction leads to protocol-specific images with NEMA NU-2 phantom-based filtering that meet EANM 1.0 quantitative harmonisation standards. The OSEM reconstruction parameters also met the EANM requirements on activity recovery.
The matrix size for all reconstructions was 168 × 168 voxels, resulting in isotropic voxels of 4.07 × 4.07 × 4.07 mm. Scatter and CT attenuation corrections were also applied.

PET tumour delineation
PET images were contoured by two experienced PET readers using MIM image-contouring tools (MIM-5.6, MIM Software Inc, Cleveland, OH). Two different contouring methods were performed, a 40% of SUV max thresholding technique and a gradient-based technique involving the PET edge contouring tool [19,20]. The procedures focused only on the primary tumour and did not include involved node(s), except in cases of bulky disease, where tumoural and nodal uptake could not be separated.

Comparison of tumour volumes and statistical analysis
MATVs were compared by determining the union and the intersection between each pair of reconstruction methods (PSF vs OSEM and PSF 7 vs OSEM), and then computing the Dice coefficients and concordance indices as follows: where MATV1 and MATV2 are two volumes delineated on different reconstructions for a given tumour and ∪ and ∩ are respectively the union and the intersection between the volumes. Representative volumes and their union and intersection are shown in Fig. 2. These indices give a representative geometrical description of changes in MATVs, combining both volume and positional changes [16]. Their values vary between 0 where the MATVs are completely disjointed and 1 where the MATVs match perfectly in terms of size, shape, and location.
Quantitative data are presented as mean and standard deviation (SD) or median and interquartile range, as appropriate. Bland-Altman analyses were used to compare MATVs obtained using the three reconstruction methods. The metrics obtained on each of the three sets of PET images were compared globally using Friedman tests with a post hoc Dunn test [21] used to compare each pair of reconstructions (PSF vs OSEM, PSF 7 vs PSF and PSF 7 vs OSEM). The Friedman non-parametric test was chosen because not all the quantitative values had a normal distribution, as tested with the Shapiro Wilk normality test.
The Dice coefficients and concordance indices between the OSEM and PSF or PSF 7 reconstructions were compared using the Wilcoxon test for paired samples. Interobserver variability was assessed with Lin's concordance coefficient [11]. Moreover, the whole analysis (comparison of volumes and Dice coefficients and concordance indices) was performed in duplicate with volumes extracted by the two observers. For all statistical tests, a two-tailed P value of less than 0.05 was considered statistically significant. Graphs and analyses were performed using Prism (version 5.0f, GraphPad Software, La Jolla, CA).

Patient characteristics
Eighteen patients (16 males, 2 females; mean (±SD) age 61 (±11) years) were included. The patient characteristics and their TNM and AJCC stages are listed in Tables 1 and 2, respectively. Comparison of tumour volumes calculated using the three reconstruction techniques Forty percent isocontour method Tumour delineation on unfiltered PSF images resulted in significantly smaller volumes (median = 18.6 cm 3 , interquartile range 4 to 37) than obtained with the OSEM algorithm (median = 36.4 cm 3 , interquartile range 7.1 to 50.2; P < 0.001). The use of EARL-compliant PSF images (PSF 7 ) resulted in volumes similar to those obtained with OSEM reconstructions (median = 34.6 cm 3 , interquartile range 7.9 to 51.4; no significant difference; Fig. 3a).
The mean percentage difference between the unfiltered PSF and OSEM reconstructions for isocontour-   Fig. 4c). After application of the 7-mm Gaussian filter, the mean percentage difference was reduced to 1.1% (95% CI −12.1 to 10; Fig. 4d).

Concordance between MATVs from unfiltered PSF, OSEM, and PSF EARL-compliant reconstructions Forty percent isocontour method
With consideration of MATVs delineated on OSEM images as the reference standard, the use of PSF 7 images resulted in significantly higher Dice coefficients (median value = 0.96 vs 0.77, P < 0.0001; Fig. 5a) and concordance indices (median value = 0.92 vs 0.64, P < 0.0001; Fig. 5b) than those obtained with unfiltered PSF images. The interquartile ranges were also narrower when the PSF 7 images were used.

Gradient-based method
In comparison to the OSEM method, the Dice coefficients and concordance indices were superior to 0.8 with either the unfiltered PSF-or PSF 7 -delineated MATVs. Despite this high similarity, the use of PSF 7 images resulted in significantly higher Dice coefficients (median value = 0.96 vs 0.94; P = 0.01) and concordance indices (median value = 0.89 vs 0.85; P = 0.003), and narrower interquartile ranges were observed for PSF 7 images (Fig. 5c, d).
Representative images of the isocontour-and gradientbased tumour delineations using all the three reconstruction methods are shown in Fig. 6.

Inter-observer variability for the 40% isocontour and gradient-based methods
There was an almost perfect inter-observer agreement between each pair of volumes assessed by both observers, with Lin concordance coefficient greater than 0.99 in all cases (Additional file 1: Figure S1).
Regarding the comparison of tumour volumes calculated using the three reconstruction techniques, similar trends were found for both observers, except when comparing PSF and OSEM volumes delineated with the gradient-based method: for observer 2, a statistically significant difference was found between OSEM and PSF volumes (Additional file 2: Figure S2).
When it comes to the concordance between MATVs from unfiltered PSF, OSEM, and PSF EARL-compliant reconstructions, similar results were obtained (Additional file 3: Figure S3). Using the 40% isocontour method and taking the MATVs delineated on OSEM images as a reference standard, the use of PSF 7 images led to significantly higher Dice coefficients (median value = 0.96 vs 0.75, P = 0.0002) and concordance indices (median value = 0.93 vs 0.61, P = 0.0002) than those obtained using PSF images. The gradient-based methodology was also found less sensitive to reconstruction variability than the 40% isocontour method. Dice coefficients and concordance indices were superior to 0.8 for both PSF reconstruction methods. The use of PSF 7 images led to narrower interquartile ranges and significantly higher Dice coefficients (median value = 0.95 vs 0.94; P = 0.01) and concordance indices (median value = 0.91 vs 0.89; P = 0.02) than those obtained with PSF images. Fig. 4 Comparison of MATVs from PSF images optimised for diagnostic potential and EARL-compliant PSF 7 images (observer 1). Relationships between MATVs extracted from OSEM reconstructions and PSF or PSF 7 reconstructions, for the 40% isocontour-based methods (a, b) and gradient-based methods (c, d), assessed using Bland-Altman plots Fig. 5 Impact of the EARL harmonization strategy on Dice and concordance indices between MATVs extracted from OSEM images and PSF images (observer 1). The PSF 7 images were filtered to meet EARL requirements while PSF images were optimised for diagnostic potential. Data are shown as Tukey boxplots (lines displaying the median, 25th, and 75th percentiles; crosses represents the mean values). Dice coefficients and concordance indices are shown for both the isocontour-based method (a, b) and gradient-based method (c, d). ns not significant Fig. 6 Representative images of isocontour-and gradient-based automatic contouring of PSF images optimised for diagnostic and EARLcompliant purposes. Maximum intensity projections and transverse slices at the level of a necrotic tumour in the right upper lobe of the lung are shown for OSEM, unfiltered PSF images, and PSF images filtered to meet EARL requirements (PSF 7 ). Dice and concordance indices are given for each contouring method, using the MATV extracted from the OSEM images as a reference standard

Discussion
This study demonstrates that automatic contouring of lung tumours on EARL-compliant PSF images using a widely adopted automatic isocontour methodology is an accurate means of overcoming reconstruction variability in MATV delineation. With OSEM used as a reference, this harmonization strategy led to concordance indices greater than 0.9, with very narrow confidence intervals. This supports the use of EARL-compliant images in multicentre studies, where MATVs extracted from 18F-FDG PET are used for tumour targeting or as a prognostic tool (for example, using the median value of pooled data as a cut-off value). EARL-compliant images could also be used in clinical routine in centres running more than one PET system, a situation that is more frequently being encountered.
The gradient-based methodology appears less sensitive to reconstruction variability, with median values for Dice coefficients and concordance indices between the MATVs delineated on OSEM and PSF images greater than 0.8. However, the use of EARL-compliant PSF images significantly improved these indices, demonstrating the value of harmonised-images, even when more advanced contouring algorithms such as gradient-based contouring are utilised.
In this study, we focused on PSF reconstruction, which implements the detector response function. At the edges of the field of view (FOV), photons are likely to strike crystals at an angle and, as the depth-of-interaction is not known, they may travel through another crystal before they light up. This phenomenon leads to incorrect lines of response, especially at the edge of the FOV; therefore, resolution is not uniform throughout the FOV. The aim of PSF reconstruction is to minimise this effect, thus decreasing partial volume effects, which in turn decreases the spill-in and spill-over within and around a tumour lesion and improves image contrast. In line with these properties of PSF reconstruction, contouring around lung tumours with the isocontour methodology on PSF images led to significantly smaller volumes than obtained from OSEM images. PSF modelling is available from the major PET vendors [22][23][24]. Though the improvement of spatial resolution may vary depending on the PET system, as well as corrections for the Gibbs artefact [25][26][27] impacting on quantitation for small lesions, PSF reconstruction consistently increases SUV metrics compared to standard OSEM reconstruction. Therefore, we feel that results similar to those reported in the present study would be obtained with other systems when using the isocontour method.
The present study did not explore other methods for improving tumour delineation on PET/CT images, such as advanced contouring methodologies like contrastorientated isocontour [28,29] or the FLAB algorithm [30]. The parameters accounting for the respective weight of tumour and background uptake in the choice of the optimal threshold used in the contrast-orientated isocontour are known to be specific to the system used [29]. We therefore assume that EARL-compliant images would be useful for contouring of tumours with this algorithm in a multicentre setting.
With regards to the use of MATV for radiotherapy planning, our study focused on the initial and crucial step of automatic tumour contouring on PET images. We used automatic contouring so that no other confounding factor such as inter-observer variability [6] could affect the MATVs. As reported in a recent consensus paper from the IAEA [31], the final gross tumour volume used for tumour targeting will also depend on other diagnostic modalities and the use of margin and consensus reading between PET readers and radiation oncologists.

Conclusion
This study shows that automatic contouring of lung tumours on EARL-compliant PSF images using the widely adopted automatic isocontour methodology is an accurate means to overcome reconstruction variability in MATV delineation. Although the gradient-based methodology appears to be less sensitive to reconstruction variability, the use of EARL-compliant PSF images significantly improved the Dice coefficient and concordance indices, suggesting that the use of harmonised-images is still important, even with more advanced contouring algorithms.

Additional files
Additional file 1: Figure S1. Inter-observer concordance for volume delineation. Relationships between MATVs extracted from OSEM reconstructions and PSF or PSF 7 reconstructions for observers were compared using the Lin concordance coefficient (ρ c ) for the 40% isocontour (a) and gradient-based (b) methods. (TIFF 8917 kb) Additional file 2: Figure S2. Impact of the EARL harmonization strategy on MATVs defined by the isocontour and gradient-based delineation methods (observer 2). MATVs are shown as Tukey boxplots (lines displaying the median, 25th and 75th percentiles; cross represents the mean values). Legends for p values: ***<0.001; **<0.01; *<0.05. ns, not significant. (TIFF 17693 kb) Additional file 3: Figure S3. Impact of the EARL harmonization strategy on Dice and concordance indices between MATVs extracted from OSEM images and PSF images (observer 2). The PSF 7 images were filtered to meet EARL requirements while PSF images were optimised for diagnostic potential. Data are shown as Tukey boxplots (lines displaying the median, 25th, and 75th percentiles; crosses represents the mean values). Dice coefficients and concordance indices are shown for both the isocontour method (a and b) and gradient-based method (c and d).