Skip to main content
  • Original research
  • Open access
  • Published:

Asphericity of tumor FDG uptake in non-small cell lung cancer: reproducibility and implications for harmonization in multicenter studies

Abstract

Background

Asphericity (ASP) of the primary tumor’s metabolic tumor volume (MTV) in FDG-PET/CT is independently predictive for survival in patients with non-small cell lung cancer (NSCLC). However, comparability between PET systems may be limited. Therefore, reproducibility of ASP was evaluated at varying image reconstruction and acquisition times to assess feasibility of ASP assessment in multicenter studies.

Methods

This is a retrospective study of 50 patients with NSCLC (female 20; median age 69 years) undergoing pretherapeutic FDG-PET/CT (median 3.7 MBq/kg; 180 s/bed position). Reconstruction used OSEM with TOF4/16 (iterations 4; subsets 16; in-plane filter 2.0, 6.4 or 9.5 mm), TOF4/8 (4 it; 8 ss; filter 2.0/6.0/9.5 mm), PSF + TOF2/17 (2 it; 17 ss; filter 2.0/7.0/10.0 mm) or Bayesian-penalized likelihood (Q.Clear; beta, 600/1750/4000). Resulting reconstructed spatial resolution (FWHM) was determined from hot sphere inserts of a NEMA IEC phantom. Data with approx. 5-mm FWHM were retrospectively smoothed to achieve 7-mm FWHM. List mode data were rebinned for acquisition times of 120/90/60 s. Threshold-based delineation of primary tumor MTV was followed by evaluation of relative ASP/SUVmax/MTV differences between datasets and resulting proportions of discordantly classified cases.

Results

Reconstructed resolution for narrow/medium/wide in-plane filter (or low/medium/high beta) was approx. 5/7/9 mm FWHM. Comparing different pairs of reconstructed resolution between TOF4/8, PSF + TOF2/17, Q.Clear and the reference algorithm TOF4/16, ASP differences was lowest at FWHM of 7 versus 7 mm. Proportions of discordant cases (ASP > 19.5% vs. ≤ 19.5%) were also lowest at 7 mm (TOF4/8, 2%; PSF + TOF2/17, 4%; Q.Clear, 10%). Smoothing of 5-mm data to 7-mm FWHM significantly reduced discordant cases (TOF4/8, 38% reduced to 2%; PSF + TOF2/17, 12% to 4%; Q.Clear, 10% to 6%), resulting in proportions comparable to original 7-mm data. Shorter acquisition time only increased proportions of discordant cases at < 90 s.

Conclusions

ASP differences were mainly determined by reconstructed spatial resolution, and multicenter studies should aim at comparable FWHM (e.g., 7 mm; determined by in-plane filter width). This reduces discordant cases (high vs. low ASP) to an acceptable proportion for TOF and PSF + TOF of < 5% (Q.Clear: 10%). Data with better resolution (i.e., lower FWHM) could be retrospectively smoothed to the desired FWHM, resulting in a comparable number of discordant cases.

Background

Patients with early-stage or locally advanced non-small cell lung cancer (NSCLC) are potential candidates for curatively intended therapy; however, management decisions are primarily based on the clinical tumor stage as a single factor only [1]. In the average of patients, adjuvant chemotherapy only showed modest survival benefits [2,3,4], and therefore, more effective methods of treatment selection are highly warranted.

Consequently, numerous additional prognostic or predictive factors [5,6,7], among image-derived parameters [8,9,10,11,12], have been investigated aiming at more differentiated outcome prediction and more differentiated management decisions. Among parameters from positron emission tomography/computed tomography with [18F]fluorodeoxyglucose (FDG-PET/CT), asphericity (ASP) is a parameter that reflects shape irregularity of the primary tumor’s metabolic tumor volume (MTV), combining metric and metabolic features of the primary tumor. Three retrospective studies confirmed its independent prognostic value for progression-free (PFS) and overall survival (OS) in patients with NSCLC [13,14,15]. The largest study (311 patients, UICC stage I–III) further showed that ASP, with a cutoff of > 19.5%, could identify patients with UICC stage II treated by surgery and adjuvant chemotherapy with high ASP and reduced PFS (median 11 months vs. not reached) and OS (22 months vs. not reached) [15]. ASP was superior for survival prediction compared to primary tumor’s maximum standardized uptake value (SUVmax) and MTV, two other previously proposed and common PET parameters [8, 9, 16, 17].

Studies on quantitative PET parameters have mostly been monocentric, but the main limitation of any PET parameter is its dependence on numerous technical factors including image reconstruction algorithms. Therefore, results may fail to reproduce in a multicenter approach unless harmonization between centers is ensured [18,19,20]. SUVmax and MTV may vary by > 30% if basic ordered subset expectation maximization (OSEM) reconstruction is combined with time-of-flight (TOF) information and/or scanner-specific compensation for the point spread function (PSF) [19,20,21,22].

Variability of ASP has not been investigated so far, but an impact of different reconstruction methods and resulting levels of image noise can be expected. The definition of ASP includes the MTV and its surface; therefore, a variability of MTV will cause variability of ASP. Since MTV also varies notably depending on the applied delineation algorithm [20, 23,24,25], there are two potential sources of variability of ASP: image generation and lesion delineation.

The goal of the current study was to investigate differences in ASP resulting from variability in image generation (common reconstruction methods and acquisition times). The focus was on the assessment if the resulting variation is acceptable for application in multicenter studies and on defining the range of acceptable variation of the influencing factors. Specifically, the goal was not to investigate the trueness of ASP itself, to identify a ground truth or to define a highly optimized reconstruction protocol for a specific PET scanner. To the contrary, this study investigated whether ASP could still be used in multicenter studies under imperfect clinical conditions with different scanners and a certain variation in acquisition protocols (uptake time, acquisition time). Such variability introduced by image generation should be separated from variations in image post-processing, the software for image feature extraction [26] or variation in lesion delineation. Therefore, data were not post-processed (unless specified), and the same software and delineation method were used as in the preceding studies on ASP in NSCLC [13,14,15]. To facilitate interpretation, SUVmax and MTV were investigated analogously for comparison.

Methods

Phantom data

A NEMA IEC body phantom was examined using a GE Discovery MI PET scanner (GE Healthcare, General Electric, Boston, MA, USA) with a 3-ring detector with silicon photomultipliers (SiPM) and a reported sensitivity of 7.3 cps/kBq [27]. Total activity in field of view was approximately 35 MBq. The absolute activities were measured in a certified dose calibrator (ISOMED 2010, MED Dresden GmbH, Germany), which was also used for regular cross calibration of the PET scanner (every 6 months). Sphere inserts (inner diameter 10, 13, 17, 22, 28, and 37 mm) were filled with 24.4 kBq/ml F18-fluoride, while the background was filled with 3.1 kBq/ml (sphere-to-background ratio, approx. 8:1). Acquisition time was 3 min per bed position (transaxial field of view, 70 cm; matrix size, 256 × 256; voxel size, 2.73 × 2.73 × 2.78 mm3). CT data of the phantom were used for attenuation correction. Scatter correction, random correction and dead time correction were also performed.

PET raw data were reconstructed using OSEM with time of flight (TOF; GE “VUE Point FX”) with 4 iterations and 16 subsets (i.e., TOF4/16). This reconstruction was defined as the reference algorithm for subsequent analyses and used either a 2.0-mm, 6.4-mm or 9.5-mm in-plane Gaussian filter (i.e., TOF4/16/2, TOF4/16/6.4 or TOF4/16/9.5). Further reconstruction was performed with OSEM and TOF with 4 iterations, 8 subsets and either 2.0 mm, 6.0 mm or 9.5 mm in-plane filter (TOF4/8/2, TOF4/8/6 or TOF4/8/9.5).

Additionally, data were reconstructed using OSEM with TOF and point spread function (OSEM + PSF + TOF, hereafter referred to as PSF + TOF; GE “VUE Point FX” with “SharpIR”) with 2 iterations and 17 subsets and either 2.0-mm, 7.0-mm or 10.0-mm in-plane filter (PSF + TOF2/17/2, PSF + TOF2/17/7 or PSF + TOF2/17/10), respectively. TOF and PSF + TOF reconstructions always included a “standard” z-axis filter.

All data were also reconstructed using Bayesian-penalized likelihood reconstruction (GE “Q.Clear”) with a penalization factor β of 600, 1750 or 4000 (Q.Clear600, Q.Clear1750 or Q.Clear4000), respectively.

Reconstructed spatial resolution was assessed as the full width at half maximum (FWHM) of the PSF in the reconstructed phantom images. PSF was modeled by a 3D Gaussian, and FWHM was determined by applying the method described in detail by Hofheinz et al. [28]. This method is based on fitting the analytic solution for the radial activity profile of a homogeneous sphere convolved with a 3D Gaussian to the reconstructed data. In this process, the full 3D vicinity of each sphere is evaluated by transforming the data to spherical coordinates relative to the respective sphere's center. A summary of the used reconstructions, resulting spatial resolution and image noise (patient data) is given in Table 1. Representative radial profiles are shown in Fig. 1.

Table 1 Reconstruction parameters and image noise
Fig. 1
figure 1

Sphere activity profiles. a Radial activity profiles of the 37-mm sphere for the reference algorithm with different in-plane filter widths to achieve different levels of reconstructed spatial resolution (FWHM). Acquisition time was 180 s. Substantial noise propagation can be observed at FWHM of approx. 5 mm. b Corresponding profiles for 6.4-mm in-plane filter width at shorter acquisition times. Noise especially increases between 90 and 60 s acquisition time, while reconstructed spatial resolution remains similar

To study effects of different acquisition time per bed position, PET list mode data were retrospectively rebinned to reconstruct further datasets representing an acquisition time of 120 s, 90 s or 60 s, respectively. Reconstruction was then performed with the algorithms that resulted in a reconstructed spatial resolution of 7 mm (i.e., TOF4/8/6, TOF4/16/6.4, PSF + TOF2/17/7 and Q.Clear1750).

Patients and scans

Fifty patients (female 20; median age 69 years; range 46 to 83 years) with histologically proven NSCLC underwent pretherapeutic FDG-PET/CT between July 2018 and February 2019 using the same scanner. Patients were required to fast for at least 6 h prior to tracer administration, and a blood glucose level of ≤ 150 mg/dl was ensured. A median activity of 249 MBq (interquartile range [IQR], 238 to 257 MBq; range 209 to 274 MBq) or 3.7 MBq/kg (IQR 3.1 to 4.2 MBq/kg; range 2.0 to 5.7 MBq/kg) was administered intravenously. Static PET data were acquired after a median uptake time of 65 min (IQR 61 to 70 min; range 55 to 96 min) from the base of skull to the proximal femora in 3D acquisition mode (acquisition time, 180 s per bed position; bed overlap, approx. 25%). Attenuation correction was based on a non-enhanced low-dose CT (automated tube current modulation “Smart mA”; maximum tube current–time product 100 mAs; tube voltage 120 kV; gantry rotation time 0.5 s) or non-enhanced diagnostic CT (maximum tube current–time product, 200 mAs).

PET raw data were reconstructed as described above (patient example in Fig. 2). Furthermore, data with 5-mm FWHM resolution were smoothed with a Gaussian filter (5 mm FWHM). According to

$${\text{FWHM}}_{{{\text{target}}}}^{2} = {\text{FWHM}}_{{{\text{original}}}}^{2} + {\text{FWHM}}_{{{\text{filter}}}}^{2}$$
(1)
Fig. 2
figure 2

Patient example. Coronar FDG-PET images of the thorax for a patient are displayed for all 12 reconstruction algorithms (body mass index 22.5 kg/m2; injected activity 3.5 MBq/kg; acquisition time 180 s per bed position). The given noise level is the median of all 50 patients. Data are separated by reconstructed spatial resolution of approx. 5 mm (left column), 7 mm (middle column) or 9 mm FWHM (right column), respectively. The reference algorithm is highlighted in green. At 7 mm FWHM spatial resolution, ASP of the primary tumor (red arrow) was concordantly high (> 19.5%) with all algorithms except for Q.Clear

this results in a target spatial resolution of approximately 7 mm. Altogether, 25 image data per patient with different spatial resolution and noise (i.e., acquisition time) were generated.

Data evaluation

Evaluation of the data was performed with a dedicated software (ROVER, version 3.0.34, ABX advanced biochemical compounds GmbH, Radeberg, Germany) by an experienced physician in nuclear medicine. MTV of the primary tumor was delineated in each dataset using the same threshold-based, background-adapted algorithm [29]. Delineation was visually inspected and manually corrected if deemed necessary. Tumoral FDG-avid tissue not related to the primary tumor and delineable from the latter (lymph nodes, metastases) was excluded. If the primary tumor was determined to be multifocal (i.e., separate ipsilateral tumor nodules) or the presence of lymphangitic carcinomatosis was diagnosed by interdisciplinary consensus, all tumor nodules and FDG-avid lymphangitic tissue were included in the MTV (see also [15]). SUVmax and ASP [30] of the MTV were derived. SUV was normalized using the body weight in kg.

ASP was calculated identical to its initial definition by the authors [30], which was unaltered in subsequent publications [13,14,15, 31,32,33,34,35,36,37]:

$${\text{ASP}} \left( \% \right) = \left( {\sqrt[3]{H} - 1} \right)*100\% \quad {\text{with}}\quad H = \frac{1}{36\pi }*\frac{{S^{3} }}{{V^{2} }}$$
(2)

S and V are the surface area and the volume of the MTV, respectively. S was computed as the sum of all voxel surfaces that form the outer and inner surfaces of the MTV multiplied by the factor 2/3. Note that this corresponds to the approximation of the surface area of discrete 3D objects using six voxel classes as described by [38].

Please note that this definition of the MTV surface area is distinctly different from the definition by the Image Biomarker Standardization Initiative (IBSI), and compliance of both definitions cannot be assumed. The IBSI estimates the MTV surface area using a mesh-based representation after triangulation of the MTV’s outer surface [26]. Additional file 1 provides the IBSI checklist for an overview of all methodological aspects of image generation and image processing in the present analysis. Distribution of ASP values in all current 50 patients is illustrated in Fig. 3.

Fig. 3
figure 3

Distribution of ASP values with the reference algorithm. ASP values at reconstructed spatial resolution of 7.0-mm FWHM and acquisition time of 180 s are displayed for each of the 50 patients for TOF4/8/6, PSF + TOF2/17/7 and Q.Clear1750. The cutoff at 19.5% is highlighted at each axis. Several tumors feature ASP in proximity to this cutoff. Data points that are located either in the left upper section or in the right lower section of the diagram represent discordantly classified cases when compared to the reference algorithm TOF4/16/6.4 (TOF4/8/6, n = 1; PSF + TOF2/17/7, n = 2; Q.Clear1750, n = 5 discordant cases)

In each dataset, a spherical volume of interest (VOI) of approx. 19 ml was placed in the unaffected right liver lobe to derive its SUVmean and SUV standard deviation and calculate image noise (SUV standard deviation/SUVmean).

Statistical analysis

Statistical analysis was performed using SPSS 22 (IBM Corporation, Armonk, NY, USA). Descriptive parameters were expressed as median and IQR. Relative differences between any dataset a and the reference dataset b were calculated as follows:

$${\text{Relative}}\,{\text{difference}}\,(\% ) = \frac{{\left| {a - b} \right|}}{b} \times 100\%$$
(3)

The significance of these differences was assessed with Wilcoxon signed-rank test for paired data. Proportions (%) of discordantly classified cases (high vs. low ASP/SUVmax/MTV) between algorithms were given with their 95% binomial proportion confidence intervals (95% CI), which included the continuity correction of ± 0.5/n (= ± 0.5/50 = ± 1%). Classification with ASP (> 19.5%) was based on a previously identified cutoff in NSCLC patients [15] while cutoffs for SUVmax (> 10.5) and MTV (> 9.5 ml) were the respective median among the current 50 patients. Proportions between different pairs of algorithms were compared with two-sided McNemar’s test. Correlation between ASP and MTV was examined using the Pearson correlation coefficient r and interpretation criteria based on [39]. Statistical significance was generally assumed at p < 0.05.

Results

Relative differences

To identify the level of reconstructed spatial resolution that provides minimal relative ASP difference to the reference algorithm (TOF4/16), different combinations of spatial resolution for candidate algorithms (TOF4/8, PSF + TOF2/17, Q.Clear) and the reference algorithm were compared pairwise (Table 2).

Table 2 Relative differences to the reference algorithm

Relative ASP differences with TOF4/8 and PSF + TOF2/17 compared to TOF4/16 were significantly lower at 7 versus 7 mm than at 5 versus 7 mm, 9 versus 7 mm and 5 versus 5 mm (each p < 0.001). In contrast, differences with Q.Clear versus TOF4/16 at 7 versus 7 mm (median, 31.3%; IQR, 11.2 to 43.7%) were similar to 9 versus 7 mm (24.7%; 15.4 to 51.4%; p = 0.25). Relative ASP differences at 7 versus 7 mm were similar to 9 versus 9 mm with TOF4/8 (median, 7.6% vs. 9.3%; p = 0.38), PSF + TOF2/17 (12.8% vs. 16.2%; p = 0.25) and Q.Clear (31.3% vs. 29.1%; p = 0.33).

Relative SUVmax and MTV differences at 7 versus 7 mm were significantly lower than corresponding ASP differences (each p < 0.001; Table 2).

Proportions of discordantly classified cases (original data)

The proportion of discordantly classified cases (ASP > 19.5% vs. ASP ≤ 19.5%) with TOF4/8 compared to the reference algorithm at 7 versus 7 mm was 2% (95% CI 0–6.9%) and significantly lower than at 5 versus 7 mm or 9 versus 7 mm (38% and 16%, each p < 0.05; Table 3) but similar to 5 versus 5 mm and 9 versus 9 mm (6% and 2%, each p > 0.5).

Table 3 Discordant cases relative to the reference algorithm (ASP)

Conversely, PSF + TOF2/17 showed significantly lower proportions at 7 versus 7 mm (4%; 95% CI 0–10.4%) compared to 5 versus 5 mm (32%, p = 0.001), while proportions were similar to 5 versus 7 mm, 9 versus 7 mm and 9 versus 9 mm (12%, 12% and 6%, each p > 0.1).

Q.Clear resulted in significantly lower proportions of discordant cases at 7 versus 7 mm (10%; 95% CI 0.7–19.3%) than at 9 versus 7 mm and 5 versus 5 mm (26% and 38%, each p < 0.01), while proportions were similar to 5 versus 7 mm and 9 versus 9 mm (10% and 12%, each p = 1.0).

Proportions at 7 versus 7 mm were comparable between TOF4/8 and PSF + TOF2/17 (2% vs. 4%; p = 1.0), while both algorithms showed slightly less discordant cases than Q.Clear (10%; each p > 0.1).

Proportions of discordant cases at 7 versus 7 mm were comparable between ASP, SUVmax and MTV with TOF4/8 (2% vs. 6% vs. 2%; each p > 0.5), PSF + TOF2/17 (4% vs. 0% vs. 4%; each p = 1.0) and Q.Clear (10% vs. 6% vs. 8%; each p = 1.0; Additional file 2: Table S1).

The number of discordantly classified cases tended to decrease when allowing a ± 5% tolerance range around the ASP cutoff value (i.e., low ASP, < 20.48%; high ASP, > 18.53%; Table 3).

Relative differences and discordant cases (retrospectively smoothed data)

Comparing data that were retrospectively smoothed to achieve 7-mm reconstructed spatial resolution with the original 7 mm data, relative differences between TOF4/8 and the reference algorithm TOF4/16 were higher in retrospectively smoothed data for ASP but similar for SUVmax and MTV (details in Table 4). In contrast, relative differences with PSF + TOF2/17 were comparable for ASP and significantly higher in the smoothed data for SUVmax and MTV. With Q.Clear, relative differences for ASP, SUVmax and MTV were each significantly lower in the smoothed data compared to original 7-mm data.

Table 4 Relative differences to the reference algorithm: smoothed data

Proportions of discordantly classified cases at 7 versus 7 mm were comparable between retrospectively smoothed data and original 7 mm data for TOF4/8 (smoothed vs. original, 2% vs. 2%; p = 1.0), for PSF + TOF2/17 (4% vs. 4%; p = 1.0) and Q.Clear (6% vs. 10%; p = 0.5). The rate of discordant cases between retrospectively smoothed data and original 7-mm data for the reference algorithm TOF4/16 itself was 2% (95% CI 0–6.9%).

Relative differences and discordant cases (reduced acquisition time)

Relative differences in ASP, SUVmax and MTV at reconstructed spatial resolution of 7 mm (TOF4/8/6, TOF4/16/6.4, PSF + TOF2/17/7 and Q.Clear1750) and shorter acquisition times are displayed in Additional file 2: Tables S2 to S4. Independent from the acquisition time for the candidate algorithms, relative differences were always calculated with regard to the reference algorithm TOF4/16/6.4 at 180 s. Briefly, relative ASP, SUVmax and MTV differences with TOF4/8/6 and TOF4/16/6.4 were significantly higher at any shorter acquisition time (i.e., 120 s, 90 s and 60 s) than at 180 s. Relative differences with PSF2/17/7 tended to remain similar between 180 and 90 s but increased significantly at 60 s. Q.Clear1750 mostly showed similar ASP, SUVmax and MTV differences between all acquisition times.

Proportions of discordantly classified cases of ASP, SUVmax and MTV with TOF4/8/6, PSF + TOF2/17/7 and Q.Clear1750 did not increase significantly with shorter acquisition time (each compared to 180 s; Additional file 2: Tables S5 to S7). Discordant cases with TOF4/16/6.4 remained similar at 120 s and 90 s but increased with 60 s acquisition time (McNemar’s test not applicable).

Correlation of ASP and MTV

Correlation of ASP and MTV (Fig. 4) for the total patient sample was moderate for TOF4/16/2 (Pearson r = 0.54; p < 0.001) and moderate to high for TOF4/16/6.4 (Pearson r = 0.69; p < 0.001) and TOF4/16/9.5 (Pearson r = 0.71; p < 0.001).

Fig. 4
figure 4

Correlation plots for ASP and MTV. Correlation plots for ASP and MTV for the three TOF4/16 algorithms. a shows plots for all patients. Correlation was moderate with TOF4/16/2 and moderate to high with TOF4/16/6.4 and TOF4/16/9.5. b Correlation was negligible (r < 0.3) in lesions with MTV ≤ 15 ml for TOF4/16/2, while the threshold was lower for TOF4/16/6.4 (MTV ≤ 2.5 ml) and TOF4/16/9.5 (MTV ≤ 5.0 ml). The generally lower correlation of ASP and MTV in smaller lesions results from the limited spatial resolution. With TOF4/16/2, high noise level contributes to the high MTV threshold for correlation. With TOF4/16/9.5, the poorer reconstructed spatial resolution may contribute to the higher MTV threshold compared to TOF4/16/6.4

The MTV threshold below which the correlation was negligible (i.e., r < 0.3) was highest for TOF4/16/2 (MTV ≤ 15 ml) and lowest for TOF4/16/6.4 (MTV ≤ 2.5 ml), while it was 5.0 ml for TOF4/16/9.5.

Discussion

This study found that ASP differences between reconstruction algorithms were significantly higher than corresponding SUVmax and MTV differences (Table 2). This may be explained by a combined effect of changes in SUVmax (suppression of local maxima and therefore a decreasing absolute threshold and increasing MTV size) and changes in MTV surface (smoothed, smaller MTV surface) on the ASP. Coarseness of the MTV surface is likely to differ with variation in reconstructed spatial resolution, which—in conventional iterative reconstruction algorithms—is mainly determined by the width of the in-plane filter. Therefore, if threshold-based MTV delineation is applied, wider filters can be expected to result in lower ASP. In Bayesian-penalized likelihood reconstruction (e.g., GE’s Q.Clear), post-processing is not applied, and smoother images are generated by increasing the penalization factor β.

However, since ASP is supposed to serve as part of prognostic/predictive models based on a predefined cutoff, even substantial inter-method differences may be clinically irrelevant if classification of individual patients into groups of high versus low ASP remains concordant. Applying a strict cutoff for ASP of > 19.5% [15], discordantly classified cases compared to the reference algorithm accounted for 2% (TOF4/8) or 4% (PSF + TOF2/17) at spatial resolution of approx. 7-mm FWHM. This could be acknowledged as acceptably low for application of ASP in a multicenter study. If a less strict cutoff with ± 5% tolerance (ASP between 18.53% and 20.48%) was applied, no discordant cases at 7-mm FWHM were observed for TOF4/8 and PSF + TOF2/17. This underlines that inter-method ASP differences at comparable spatial resolution are clinically relevant only if ASP is close to the predefined cutoff. Furthermore, this range of tolerance is well covered by the range of possible ASP cutoffs (17% to 39%) within which ASP remained significantly prognostic for PFS in previously reported patients with UICC stage II NSCLC [15].

Relative differences and discordant proportions tended to be higher with Q.Clear. Notably, Q.Clear showed systematically lower image noise at any level of spatial resolution (Table 1 and Fig. 2). In contrast to conventional algorithms, relative ASP differences with Q.Clear compared to the reference algorithm were higher at 7 versus 7 mm than at 5 versus 7 mm (Table 2) or at 7 versus 9 mm (Additional file 2: Table S8). Simultaneously, noise levels at 5 versus 7 mm and 7 versus 9 mm were also more comparable to the reference algorithm than at 7 versus 7 mm. However, the same observation was not true for SUVmax and MTV or with the conventional algorithms. Consequently, similar reconstructed spatial resolution rather than the noise level should guide the choice of reconstruction algorithms for harmonization for multicenter purposes. Furthermore, Q.Clear, or Bayesian-penalized likelihood reconstruction in general, may not be optimal to achieve minimal ASP deviations if the reference is a conventional algorithm.

With the PET scanner used in the present study, variation of image noise between algorithms was especially prominent at spatial resolution of 5-mm FWHM (Table 1, Fig. 1). This partly explains high inter-method differences, which exceeded 100% for TOF4/8 and TOF4/16 (Table 2), and frequent discordant cases even if pairs of algorithms with 5 versus 5 mm FWHM were compared. In addition to higher noise, Gibbs artifacts (edge elevations) caused by PSF + TOF and Q.Clear reconstruction increase with narrower in-plane filters or lower β [40]. Consequently, SUVmax differences will be more prominent than at 7 mm or 9 mm FWHM. In contrast, in substantially smoothed data with 9-mm FWHM, PET parameters that are reflective of heterogeneity or irregularity of tracer accumulation, such as ASP may lose discriminatory power to detect “real” and clinically relevant differences between tumors/patients. Therefore, under the conditions of the current analysis, 7-mm FWHM could be a feasible and reasonable target for harmonization in a multicenter approach. This is underlined by the observation that the MTV threshold for correlation between ASP and MTV was lowest for TOF4/16/6.4 compared to TOF4/16/9.5 and especially TOF4/16/2.

If reconstructed spatial resolution is better than the target resolution (e.g., 5 mm instead of 7-mm FWHM), retrospective smoothing of data using formula (1) can be performed to achieve the anticipated resolution. This enabled inter-method differences and discordant proportions far closer to those observed with the original 7-mm data, irrespective of TOF, PSF + TOF or Q.Clear. Consequently, in a multicenter analysis, retrospective smoothing of data with better spatial resolution would be a valid option to ensure comparability. It is important to note that here the effective reconstructed spatial resolution is relevant [28], which can differ notably from the resolution determined via point sources.

A similar approach by the EANM Research Ltd. (EARL) harmonization project was reported by Kaalep et al. who analyzed SUV and MTV in FDG-PET data of NSCLC and lymphoma patients. Only after applying an additional Gaussian post-reconstruction filter of 6- to 7-mm FWHM to PET data reconstructed with PSF + TOF (compliant with the current EARL 2 standard) could SUV and MTV differences be reduced from approx. 30% to < 10% compared to reconstruction compliant with the former EARL 1 standard [41]. In a different approach to harmonization, Tsutsui et al. examined OSEM + TOF data of a NEMA IEC phantom obtained with a Siemens Biograph mCT and showed that errors compared to a simulated reference phantom were lowest with an in-plane filter of approx. 7- to 8-mm FWHM [42]. In a different study, the group achieved harmonization between 12 different PET scanners using contrast recovery (CR) of NEMA IEC phantom spheres by applying a scanner-specific Gaussian filter of up to 8-mm FWHM [43]. The current results of low SUVmax differences < 5% and MTV differences ≤ 6% at 7 versus 7 mm FWHM imply that both CR and reconstructed spatial resolution may be suitable surrogates for harmonization.

Shorter acquisition times of 120 s, 90 s or 60 s increased inter-method differences compared to 180 s with TOF4/8/6 and TOF4/16/6.4, while the increase was insignificant or less prominent with PSF + TOF2/17/7 and Q.Clear1750. More importantly, proportions of discordantly classified cases by ASP, SUVmax or MTV remained similar or did not increase significantly—especially between 180 and 90 s. Therefore, equal acquisition times between PET systems/centers may be of secondary importance to achieve comparability in the investigated parameters, and differences as high as 180 s versus 90 s might be tolerable.

Voxel sizes may also vary between PET systems in a multicenter study. However, due to technical restrictions voxel size could not be freely varied during image reconstruction in this study. Therefore, the influence on ASP, SUVmax and MTV and the correcting effect of retrospective reslicing to the original voxel size could not be assessed. A further limitation of the current analysis is that the variation in reconstruction algorithms and acquisition time may not fully reflect differences between PET scanners beyond these factors. This would require comparative examinations with different scanners in each patient under identical conditions [20, 44]. For methodological consistency with the previous studies [13,14,15], the same threshold-based algorithm [29] was used to delineate all lesions. Consequently, the presented results are not necessarily valid when lesions are delineated differently. Furthermore, although the current study demonstrated that the reconstructed spatial resolution can be used as a surrogate for scanner harmonization and showed lowest inter-method ASP differences and the lowest MTV threshold for correlation between ASP and MTV for 7.0 FWHM, this is not sufficient for a general recommendation of this specific spatial resolution for future studies regarding the ASP. This decision should also consider the performance of all PET scanners used in a specific study (best achievable reconstructed spatial resolution) and—if available—comparative clinical results on the value of ASP at different reconstructed spatial resolution.

Conclusions

Differences in ASP, SUVmax and MTV resulting from TOF4/8, PSF + TOF2/17 or Q.Clear compared to the reference algorithm TOF4/16 were mainly determined by differences in reconstructed spatial resolution. Therefore, harmonization for ASP in multicenter studies should aim at comparable reconstructed spatial resolution between PET systems, which is determined by either in-plane filter width or the penalization factor β. With the PET scanner used in the present study, a resolution of 7-mm FWHM ensured that discordantly classified cases of high versus low ASP were at an acceptable proportion for TOF and PSF + TOF of < 5% (Q.Clear: 10%). Retrospectively smoothing data with better spatial resolution (i.e., lower FWHM) to the desired FWHM resulted in comparable results. These results require confirmation in a multicenter study.

Availability of data and materials

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Abbreviations

95% CI:

95% Confidence interval

ASP:

Asphericity

CR:

Contrast recovery

EANM:

European Association of Nuclear Medicine

EARL:

EANM Research Ltd.

FDG:

Fluorodeoxyglucose

FWHM:

Full width at half maximum

IBSI:

Image Biomarker Standardization Initiative

IEC:

International Electrotechnical Commission

it:

Iterations

IQR:

Interquartile range

MTV:

Metabolic tumor volume

NEMA:

National Electrical Manufacturers Association

NSCLC:

Non-small cell lung cancer

OS:

Overall survival

OSEM:

Ordered subset expectation maximization

PET/CT:

Positron emission tomography/computed tomography

PFS:

Progression-free survival

PSF:

Point spread function

SiPM:

Silicon photomultipliers

ss:

Subsets

SUV:

Standardized uptake value

TOF:

Time of flight

VOI:

Volume of interest

References

  1. Postmus PE, Kerr KM, Oudkerk M, Senan S, Waller DA, Vansteenkiste J, et al. Early and locally advanced non-small-cell lung cancer (NSCLC): ESMO Clinical Practice Guidelines for diagnosis, treatment and follow-up. Ann Oncol. 2017;28(suppl_4):iv1–21.

    Article  CAS  PubMed  Google Scholar 

  2. Arriagada R, Bergman B, Dunant A, Le Chevalier T, Pignon JP, Vansteenkiste J, et al. Cisplatin-based adjuvant chemotherapy in patients with completely resected non-small-cell lung cancer. N Engl J Med. 2004;350(4):351–60.

    Article  PubMed  Google Scholar 

  3. Douillard JY, Rosell R, De Lena M, Carpagnano F, Ramlau R, Gonzales-Larriba JL, et al. Adjuvant vinorelbine plus cisplatin versus observation in patients with completely resected stage IB-IIIA non-small-cell lung cancer (Adjuvant Navelbine International Trialist Association [ANITA]): a randomised controlled trial. Lancet Oncol. 2006;7(9):719–27.

    Article  CAS  PubMed  Google Scholar 

  4. Artal Cortes A, Calera Urquizu L, Hernando CJ. Adjuvant chemotherapy in non-small cell lung cancer: state-of-the-art. Transl Lung Cancer Res. 2015;4(2):191–7.

    PubMed  PubMed Central  Google Scholar 

  5. Sharpnack MF, Ranbaduge N, Srivastava A, Cerciello F, Codreanu SG, Liebler DC, et al. Proteogenomic analysis of surgically resected lung adenocarcinoma. J Thorac Oncol. 2018;13(10):1519–29.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Wang L, Dong T, Xin B, Xu C, Guo M, Zhang H, et al. Integrative nomogram of CT imaging, clinical, and hematological features for survival prediction of patients with locally advanced non-small cell lung cancer. Eur Radiol. 2019;29(6):2958–67.

    Article  PubMed  Google Scholar 

  7. Desseroit MC, Visvikis D, Tixier F, Majdoub M, Perdrisot R, Guillevin R, et al. Development of a nomogram combining clinical staging with (18)F-FDG PET/CT image features in non-small-cell lung cancer stage I-III. Eur J Nucl Med Mol Imaging. 2016;43(8):1477–85.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Liu J, Dong M, Sun X, Li W, Xing L, Yu J. Prognostic value of 18F-FDG PET/CT in surgical non-small cell lung cancer: a meta-analysis. PLoS ONE. 2016;11(1):e0146195.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Paesmans M, Garcia C, Wong CY, Patz EF Jr, Komaki R, Eschmann S, et al. Primary tumour standardised uptake value is prognostic in nonsmall cell lung cancer: a multivariate pooled analysis of individual data. Eur Respir J. 2015;46(6):1751–61.

    Article  CAS  PubMed  Google Scholar 

  10. Vanhove K, Mesotten L, Heylen M, Derwael R, Louis E, Adriaensens P, et al. Prognostic value of total lesion glycolysis and metabolic active tumor volume in non-small cell lung cancer. Cancer Treat Res Commun. 2018;15:7–12.

    Article  PubMed  Google Scholar 

  11. Park S, Ha S, Lee SH, Paeng JC, Keam B, Kim TM, et al. Intratumoral heterogeneity characterized by pretreatment PET in non-small cell lung cancer patients predicts progression-free survival on EGFR tyrosine kinase inhibitor. PLoS ONE. 2018;13(1):e0189766.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Arshad MA, Thornton A, Lu H, Tam H, Wallitt K, Rodgers N, et al. Discovery of pre-therapy 2-deoxy-2-(18)F-fluoro-D-glucose positron emission tomography-based radiomics classifiers of survival outcome in non-small-cell lung cancer patients. Eur J Nucl Med Mol Imaging. 2019;46(2):455–66.

    Article  CAS  PubMed  Google Scholar 

  13. Apostolova I, Ego K, Steffen IG, Buchert R, Wertzel H, Achenbach HJ, et al. The asphericity of the metabolic tumour volume in NSCLC: correlation with histopathology and molecular markers. Eur J Nucl Med Mol Imaging. 2016;43(13):2360–73.

    Article  CAS  PubMed  Google Scholar 

  14. Apostolova I, Rogasch J, Buchert R, Wertzel H, Achenbach HJ, Schreiber J, et al. Quantitative assessment of the asphericity of pretherapeutic FDG uptake as an independent predictor of outcome in NSCLC. BMC Cancer. 2014;14:896.

    Article  PubMed  PubMed Central  Google Scholar 

  15. Rogasch JMM, Furth C, Chibolela C, Hofheinz F, Ochsenreither S, Ruckert JC, et al. Validation of independent prognostic value of asphericity of (18)F-fluorodeoxyglucose uptake in non-small-cell lung cancer patients undergoing treatment with curative intent. Clin Lung Cancer. 2019;21:264–72.

    Article  CAS  PubMed  Google Scholar 

  16. Sharma A, Mohan A, Bhalla AS, Sharma MC, Vishnubhatla S, Das CJ, et al. Role of various metabolic parameters derived from baseline 18F-FDG PET/CT as prognostic markers in non-small cell lung cancer patients undergoing platinum-based chemotherapy. Clin Nucl Med. 2018;43(1):e8–17.

    Article  PubMed  Google Scholar 

  17. Ma W, Wang M, Li X, Huang H, Zhu Y, Song X, et al. Quantitative (18)F-FDG PET analysis in survival rate prediction of patients with non-small cell lung cancer. Oncol Lett. 2018;16(4):4129–36.

    PubMed  PubMed Central  Google Scholar 

  18. Houdu B, Lasnon C, Licaj I, Thomas G, Do P, Guizard AV, et al. Why harmonization is needed when using FDG PET/CT as a prognosticator: demonstration with EARL-compliant SUV as an independent prognostic factor in lung cancer. Eur J Nucl Med Mol Imaging. 2019;46(2):421–8.

    Article  PubMed  Google Scholar 

  19. Lasnon C, Enilorac B, Popotte H, Aide N. Impact of the EARL harmonization program on automatic delineation of metabolic active tumour volumes (MATVs). EJNMMI Res. 2017;7(1):30.

    Article  PubMed  PubMed Central  Google Scholar 

  20. Zhuang M, Garcia DV, Kramer GM, Frings V, Smit EF, Dierckx R, et al. Variability and repeatability of quantitative uptake metrics in (18)F-FDG PET/CT of non-small cell lung cancer: impact of segmentation method, uptake interval, and reconstruction protocol. J Nucl Med. 2019;60(5):600–7.

    Article  CAS  PubMed  Google Scholar 

  21. Akamatsu G, Mitsumoto K, Taniguchi T, Tsutsui Y, Baba S, Sasaki M. Influences of point-spread function and time-of-flight reconstructions on standardized uptake value of lymph node metastases in FDG-PET. Eur J Radiol. 2014;83(1):226–30.

    Article  PubMed  Google Scholar 

  22. Armstrong IS, Kelly MD, Williams HA, Matthews JC. Impact of point spread function modelling and time of flight on FDG uptake measurements in lung lesions using alternative filtering strategies. EJNMMI Phys. 2014;1(1):99.

    Article  PubMed  PubMed Central  Google Scholar 

  23. Fleckenstein J, Hellwig D, Kremp S, Grgic A, Groschel A, Kirsch CM, et al. F-18-FDG-PET confined radiotherapy of locally advanced NSCLC with concomitant chemotherapy: results of the PET-PLAN pilot trial. Int J Radiat Oncol Biol Phys. 2011;81(4):e283–9.

    Article  PubMed  Google Scholar 

  24. Dewalle-Vignion AS, Yeni N, Petyt G, Verscheure L, Huglo D, Beron A, et al. Evaluation of PET volume segmentation methods: comparisons with expert manual delineations. Nucl Med Commun. 2012;33(1):34–42.

    Article  PubMed  Google Scholar 

  25. Nestle U, Kremp S, Schaefer-Schuler A, Sebastian-Welsch C, Hellwig D, Rube C, et al. Comparison of different methods for delineation of 18F-FDG PET-positive tissue for target volume definition in radiotherapy of patients with non-Small cell lung cancer. J Nucl Med. 2005;46(8):1342–8.

    PubMed  Google Scholar 

  26. Zwanenburg A, Vallières M, Abdalah MA, Aerts HJWL, Andrearczyk V, Apte A, et al. The image biomarker standardization initiative: standardized quantitative radiomics for high-throughput image-based phenotyping. Radiology. 2020;295(2):328–38.

    Article  PubMed  Google Scholar 

  27. Vandendriessche D, Uribe J, Bertin H, De Geeter F. Performance characteristics of silicon photomultiplier based 15-cm AFOV TOF PET/CT. EJNMMI Phys. 2019;6(1):8.

    Article  PubMed  PubMed Central  Google Scholar 

  28. Hofheinz F, Dittrich S, Potzsch C, Hoff J. Effects of cold sphere walls in PET phantom measurements on the volume reproducing threshold. Phys Med Biol. 2010;55(4):1099–113.

    Article  CAS  PubMed  Google Scholar 

  29. Hofheinz F, Langner J, Petr J, Beuthien-Baumann B, Steinbach J, Kotzerke J, et al. An automatic method for accurate volume delineation of heterogeneous tumors in PET. Med Phys. 2013;40(8):082503.

    Article  CAS  PubMed  Google Scholar 

  30. Apostolova I, Steffen IG, Wedel F, Lougovski A, Marnitz S, Derlin T, et al. Asphericity of pretherapeutic tumour FDG uptake provides independent prognostic value in head-and-neck cancer. Eur Radiol. 2014;24(9):2077–87.

    Article  PubMed  Google Scholar 

  31. Wetz C, Apostolova I, Steffen IG, Hofheinz F, Furth C, Kupitz D, et al. Predictive value of asphericity in pretherapeutic [(111)In]DTPA-octreotide SPECT/CT for response to peptide receptor radionuclide therapy with [(177)Lu]DOTATATE. Mol Imag Biol. 2017;19(3):437–45.

    Article  CAS  Google Scholar 

  32. Wetz C, Genseke P, Apostolova I, Furth C, Ghazzawi S, Rogasch JMM, et al. The association of intra-therapeutic heterogeneity of somatostatin receptor expression with morphological treatment response in patients undergoing PRRT with [177Lu]-DOTATATE. PLoS ONE. 2019;14(5):e0216781.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Rogasch JMM, Hundsdoerfer P, Hofheinz F, Wedel F, Schatka I, Amthauer H, et al. Pretherapeutic FDG-PET total metabolic tumor volume predicts response to induction therapy in pediatric Hodgkin’s lymphoma. BMC Cancer. 2018;18(1):521.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Meißner S, Janssen JC, Prasad V, Brenner W, Diederichs G, Hamm B, et al. Potential of asphericity as a novel diagnostic parameter in the evaluation of patients with (68)Ga-PSMA-HBED-CC PET-positive prostate cancer lesions. EJNMMI Res. 2017;7(1):85.

    Article  PubMed  PubMed Central  Google Scholar 

  35. Hofheinz F, Lougovski A, Zöphel K, Hentschel M, Steffen IG, Apostolova I, et al. Increased evidence for the prognostic value of primary tumor asphericity in pretherapeutic FDG PET for risk stratification in patients with head and neck cancer. Eur J Nucl Med Mol Imaging. 2015;42(3):429–37.

    Article  PubMed  Google Scholar 

  36. Rogasch JMM, Hundsdoerfer P, Furth C, Wedel F, Hofheinz F, Krüger PC, et al. Individualized risk assessment in neuroblastoma: does the tumoral metabolic activity on (123)I-MIBG SPECT predict the outcome? Eur J Nucl Med Mol Imaging. 2017;44(13):2203–12.

    Article  PubMed  Google Scholar 

  37. Zschaeck S, Li Y, Lin Q, Beck M, Amthauer H, Bauersachs L, et al. Prognostic value of baseline [18F]-fluorodeoxyglucose positron emission tomography parameters MTV, TLG and asphericity in an international multicenter cohort of nasopharyngeal carcinoma patients. PLoS ONE. 2020;15(7):e0236841.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Mullikin JC, Verbeek PW. Surface area estimation of digitized planes. Bioimaging. 1993;1(1):6–16.

    Article  Google Scholar 

  39. Mukaka MM. Statistics corner: A guide to appropriate use of correlation coefficient in medical research. Malawi Med J. 2012;24(3):69–71.

    CAS  PubMed  PubMed Central  Google Scholar 

  40. Rogasch JM, Suleiman S, Hofheinz F, Bluemel S, Lukas M, Amthauer H, et al. Reconstructed spatial resolution and contrast recovery with Bayesian penalized likelihood reconstruction (Q.Clear) for FDG-PET compared to time-of-flight (TOF) with point spread function (PSF). EJNMMI Phys. 2020;7(1):2.

    Article  PubMed  PubMed Central  Google Scholar 

  41. Kaalep A, Burggraaff CN, Pieplenbosch S, Verwer EE, Sera T, Zijlstra J, et al. Quantitative implications of the updated EARL 2019 PET-CT performance standards. EJNMMI Phys. 2019;6(1):28.

    Article  PubMed  PubMed Central  Google Scholar 

  42. Tsutsui Y, Awamoto S, Himuro K, Umezu Y, Baba S, Sasaki M. Characteristics of smoothing filters to achieve the guideline recommended positron emission tomography image without harmonization. Asia Ocean J Nucl Med Biol. 2018;6(1):15–23.

    PubMed  PubMed Central  Google Scholar 

  43. Tsutsui Y, Daisaki H, Akamatsu G, Umeda T, Ogawa M, Kajiwara H, et al. Multicentre analysis of PET SUV using vendor-neutral software: the Japanese Harmonization Technology (J-Hart) study. EJNMMI Res. 2018;8(1):83.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Kramer GM, Frings V, Hoetjes N, Hoekstra OS, Smit EF, de Langen AJ, et al. Repeatability of quantitative whole-body 18F-FDG PET/CT uptake measures as function of uptake interval and lesion selection in non-small cell lung cancer patients. J Nucl Med. 2016;57(9):1343–9.

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

None.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Authors

Contributions

JMMR and FH participated in data reconstruction, analysis and interpretation as well as preparation of the manuscript. SB and PR contributed to obtaining data. HA and CF participated in data interpretation and review of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Julian M. M. Rogasch.

Ethics declarations

Ethics approval and consent to participate

All procedures were in accordance with the Charité ethics commission (vote, EA4/163/18), and informed consent was obtained from all individual participants included in the study.

Consent for publication

Written informed consent of the patient presented in Fig. 2 for publication was obtained.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1

. IBSI Checklist version 1.0 (October 2019; see reference below).

Additional file 2: Table S1

. Discordant cases relative to the reference algorithm (SUVmax and MTV). Table S2. Relative differences to the reference algorithm: Acquisition times (ASP). Table S3. Relative differences to the reference algorithm: Acquisition times (SUVmax). Table S4. Relative differences to the reference algorithm: Acquisition times (MTV). Table S5. Discordant cases relative to the reference algorithm: Acquisition times (ASP). Table S6. Discordant cases relative to the reference algorithm: Acquisition times (SUVmax). Table S7. Discordant cases relative to the reference algorithm: Acquisition times (MTV). Table S8. Relative differences and discordant cases relative to the reference algorithm (7 vs. 9 mm FWHM).

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rogasch, J.M.M., Furth, C., Bluemel, S. et al. Asphericity of tumor FDG uptake in non-small cell lung cancer: reproducibility and implications for harmonization in multicenter studies. EJNMMI Res 10, 134 (2020). https://doi.org/10.1186/s13550-020-00725-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13550-020-00725-y

Keywords