- Original research
- Open Access
Asphericity of tumor FDG uptake in non-small cell lung cancer: reproducibility and implications for harmonization in multicenter studies
EJNMMI Research volume 10, Article number: 134 (2020)
Asphericity (ASP) of the primary tumor’s metabolic tumor volume (MTV) in FDG-PET/CT is independently predictive for survival in patients with non-small cell lung cancer (NSCLC). However, comparability between PET systems may be limited. Therefore, reproducibility of ASP was evaluated at varying image reconstruction and acquisition times to assess feasibility of ASP assessment in multicenter studies.
This is a retrospective study of 50 patients with NSCLC (female 20; median age 69 years) undergoing pretherapeutic FDG-PET/CT (median 3.7 MBq/kg; 180 s/bed position). Reconstruction used OSEM with TOF4/16 (iterations 4; subsets 16; in-plane filter 2.0, 6.4 or 9.5 mm), TOF4/8 (4 it; 8 ss; filter 2.0/6.0/9.5 mm), PSF + TOF2/17 (2 it; 17 ss; filter 2.0/7.0/10.0 mm) or Bayesian-penalized likelihood (Q.Clear; beta, 600/1750/4000). Resulting reconstructed spatial resolution (FWHM) was determined from hot sphere inserts of a NEMA IEC phantom. Data with approx. 5-mm FWHM were retrospectively smoothed to achieve 7-mm FWHM. List mode data were rebinned for acquisition times of 120/90/60 s. Threshold-based delineation of primary tumor MTV was followed by evaluation of relative ASP/SUVmax/MTV differences between datasets and resulting proportions of discordantly classified cases.
Reconstructed resolution for narrow/medium/wide in-plane filter (or low/medium/high beta) was approx. 5/7/9 mm FWHM. Comparing different pairs of reconstructed resolution between TOF4/8, PSF + TOF2/17, Q.Clear and the reference algorithm TOF4/16, ASP differences was lowest at FWHM of 7 versus 7 mm. Proportions of discordant cases (ASP > 19.5% vs. ≤ 19.5%) were also lowest at 7 mm (TOF4/8, 2%; PSF + TOF2/17, 4%; Q.Clear, 10%). Smoothing of 5-mm data to 7-mm FWHM significantly reduced discordant cases (TOF4/8, 38% reduced to 2%; PSF + TOF2/17, 12% to 4%; Q.Clear, 10% to 6%), resulting in proportions comparable to original 7-mm data. Shorter acquisition time only increased proportions of discordant cases at < 90 s.
ASP differences were mainly determined by reconstructed spatial resolution, and multicenter studies should aim at comparable FWHM (e.g., 7 mm; determined by in-plane filter width). This reduces discordant cases (high vs. low ASP) to an acceptable proportion for TOF and PSF + TOF of < 5% (Q.Clear: 10%). Data with better resolution (i.e., lower FWHM) could be retrospectively smoothed to the desired FWHM, resulting in a comparable number of discordant cases.
Patients with early-stage or locally advanced non-small cell lung cancer (NSCLC) are potential candidates for curatively intended therapy; however, management decisions are primarily based on the clinical tumor stage as a single factor only . In the average of patients, adjuvant chemotherapy only showed modest survival benefits [2,3,4], and therefore, more effective methods of treatment selection are highly warranted.
Consequently, numerous additional prognostic or predictive factors [5,6,7], among image-derived parameters [8,9,10,11,12], have been investigated aiming at more differentiated outcome prediction and more differentiated management decisions. Among parameters from positron emission tomography/computed tomography with [18F]fluorodeoxyglucose (FDG-PET/CT), asphericity (ASP) is a parameter that reflects shape irregularity of the primary tumor’s metabolic tumor volume (MTV), combining metric and metabolic features of the primary tumor. Three retrospective studies confirmed its independent prognostic value for progression-free (PFS) and overall survival (OS) in patients with NSCLC [13,14,15]. The largest study (311 patients, UICC stage I–III) further showed that ASP, with a cutoff of > 19.5%, could identify patients with UICC stage II treated by surgery and adjuvant chemotherapy with high ASP and reduced PFS (median 11 months vs. not reached) and OS (22 months vs. not reached) . ASP was superior for survival prediction compared to primary tumor’s maximum standardized uptake value (SUVmax) and MTV, two other previously proposed and common PET parameters [8, 9, 16, 17].
Studies on quantitative PET parameters have mostly been monocentric, but the main limitation of any PET parameter is its dependence on numerous technical factors including image reconstruction algorithms. Therefore, results may fail to reproduce in a multicenter approach unless harmonization between centers is ensured [18,19,20]. SUVmax and MTV may vary by > 30% if basic ordered subset expectation maximization (OSEM) reconstruction is combined with time-of-flight (TOF) information and/or scanner-specific compensation for the point spread function (PSF) [19,20,21,22].
Variability of ASP has not been investigated so far, but an impact of different reconstruction methods and resulting levels of image noise can be expected. The definition of ASP includes the MTV and its surface; therefore, a variability of MTV will cause variability of ASP. Since MTV also varies notably depending on the applied delineation algorithm [20, 23,24,25], there are two potential sources of variability of ASP: image generation and lesion delineation.
The goal of the current study was to investigate differences in ASP resulting from variability in image generation (common reconstruction methods and acquisition times). The focus was on the assessment if the resulting variation is acceptable for application in multicenter studies and on defining the range of acceptable variation of the influencing factors. Specifically, the goal was not to investigate the trueness of ASP itself, to identify a ground truth or to define a highly optimized reconstruction protocol for a specific PET scanner. To the contrary, this study investigated whether ASP could still be used in multicenter studies under imperfect clinical conditions with different scanners and a certain variation in acquisition protocols (uptake time, acquisition time). Such variability introduced by image generation should be separated from variations in image post-processing, the software for image feature extraction  or variation in lesion delineation. Therefore, data were not post-processed (unless specified), and the same software and delineation method were used as in the preceding studies on ASP in NSCLC [13,14,15]. To facilitate interpretation, SUVmax and MTV were investigated analogously for comparison.
A NEMA IEC body phantom was examined using a GE Discovery MI PET scanner (GE Healthcare, General Electric, Boston, MA, USA) with a 3-ring detector with silicon photomultipliers (SiPM) and a reported sensitivity of 7.3 cps/kBq . Total activity in field of view was approximately 35 MBq. The absolute activities were measured in a certified dose calibrator (ISOMED 2010, MED Dresden GmbH, Germany), which was also used for regular cross calibration of the PET scanner (every 6 months). Sphere inserts (inner diameter 10, 13, 17, 22, 28, and 37 mm) were filled with 24.4 kBq/ml F18-fluoride, while the background was filled with 3.1 kBq/ml (sphere-to-background ratio, approx. 8:1). Acquisition time was 3 min per bed position (transaxial field of view, 70 cm; matrix size, 256 × 256; voxel size, 2.73 × 2.73 × 2.78 mm3). CT data of the phantom were used for attenuation correction. Scatter correction, random correction and dead time correction were also performed.
PET raw data were reconstructed using OSEM with time of flight (TOF; GE “VUE Point FX”) with 4 iterations and 16 subsets (i.e., TOF4/16). This reconstruction was defined as the reference algorithm for subsequent analyses and used either a 2.0-mm, 6.4-mm or 9.5-mm in-plane Gaussian filter (i.e., TOF4/16/2, TOF4/16/6.4 or TOF4/16/9.5). Further reconstruction was performed with OSEM and TOF with 4 iterations, 8 subsets and either 2.0 mm, 6.0 mm or 9.5 mm in-plane filter (TOF4/8/2, TOF4/8/6 or TOF4/8/9.5).
Additionally, data were reconstructed using OSEM with TOF and point spread function (OSEM + PSF + TOF, hereafter referred to as PSF + TOF; GE “VUE Point FX” with “SharpIR”) with 2 iterations and 17 subsets and either 2.0-mm, 7.0-mm or 10.0-mm in-plane filter (PSF + TOF2/17/2, PSF + TOF2/17/7 or PSF + TOF2/17/10), respectively. TOF and PSF + TOF reconstructions always included a “standard” z-axis filter.
All data were also reconstructed using Bayesian-penalized likelihood reconstruction (GE “Q.Clear”) with a penalization factor β of 600, 1750 or 4000 (Q.Clear600, Q.Clear1750 or Q.Clear4000), respectively.
Reconstructed spatial resolution was assessed as the full width at half maximum (FWHM) of the PSF in the reconstructed phantom images. PSF was modeled by a 3D Gaussian, and FWHM was determined by applying the method described in detail by Hofheinz et al. . This method is based on fitting the analytic solution for the radial activity profile of a homogeneous sphere convolved with a 3D Gaussian to the reconstructed data. In this process, the full 3D vicinity of each sphere is evaluated by transforming the data to spherical coordinates relative to the respective sphere's center. A summary of the used reconstructions, resulting spatial resolution and image noise (patient data) is given in Table 1. Representative radial profiles are shown in Fig. 1.
To study effects of different acquisition time per bed position, PET list mode data were retrospectively rebinned to reconstruct further datasets representing an acquisition time of 120 s, 90 s or 60 s, respectively. Reconstruction was then performed with the algorithms that resulted in a reconstructed spatial resolution of 7 mm (i.e., TOF4/8/6, TOF4/16/6.4, PSF + TOF2/17/7 and Q.Clear1750).
Patients and scans
Fifty patients (female 20; median age 69 years; range 46 to 83 years) with histologically proven NSCLC underwent pretherapeutic FDG-PET/CT between July 2018 and February 2019 using the same scanner. Patients were required to fast for at least 6 h prior to tracer administration, and a blood glucose level of ≤ 150 mg/dl was ensured. A median activity of 249 MBq (interquartile range [IQR], 238 to 257 MBq; range 209 to 274 MBq) or 3.7 MBq/kg (IQR 3.1 to 4.2 MBq/kg; range 2.0 to 5.7 MBq/kg) was administered intravenously. Static PET data were acquired after a median uptake time of 65 min (IQR 61 to 70 min; range 55 to 96 min) from the base of skull to the proximal femora in 3D acquisition mode (acquisition time, 180 s per bed position; bed overlap, approx. 25%). Attenuation correction was based on a non-enhanced low-dose CT (automated tube current modulation “Smart mA”; maximum tube current–time product 100 mAs; tube voltage 120 kV; gantry rotation time 0.5 s) or non-enhanced diagnostic CT (maximum tube current–time product, 200 mAs).
PET raw data were reconstructed as described above (patient example in Fig. 2). Furthermore, data with 5-mm FWHM resolution were smoothed with a Gaussian filter (5 mm FWHM). According to
this results in a target spatial resolution of approximately 7 mm. Altogether, 25 image data per patient with different spatial resolution and noise (i.e., acquisition time) were generated.
Evaluation of the data was performed with a dedicated software (ROVER, version 3.0.34, ABX advanced biochemical compounds GmbH, Radeberg, Germany) by an experienced physician in nuclear medicine. MTV of the primary tumor was delineated in each dataset using the same threshold-based, background-adapted algorithm . Delineation was visually inspected and manually corrected if deemed necessary. Tumoral FDG-avid tissue not related to the primary tumor and delineable from the latter (lymph nodes, metastases) was excluded. If the primary tumor was determined to be multifocal (i.e., separate ipsilateral tumor nodules) or the presence of lymphangitic carcinomatosis was diagnosed by interdisciplinary consensus, all tumor nodules and FDG-avid lymphangitic tissue were included in the MTV (see also ). SUVmax and ASP  of the MTV were derived. SUV was normalized using the body weight in kg.
S and V are the surface area and the volume of the MTV, respectively. S was computed as the sum of all voxel surfaces that form the outer and inner surfaces of the MTV multiplied by the factor 2/3. Note that this corresponds to the approximation of the surface area of discrete 3D objects using six voxel classes as described by .
Please note that this definition of the MTV surface area is distinctly different from the definition by the Image Biomarker Standardization Initiative (IBSI), and compliance of both definitions cannot be assumed. The IBSI estimates the MTV surface area using a mesh-based representation after triangulation of the MTV’s outer surface . Additional file 1 provides the IBSI checklist for an overview of all methodological aspects of image generation and image processing in the present analysis. Distribution of ASP values in all current 50 patients is illustrated in Fig. 3.
In each dataset, a spherical volume of interest (VOI) of approx. 19 ml was placed in the unaffected right liver lobe to derive its SUVmean and SUV standard deviation and calculate image noise (SUV standard deviation/SUVmean).
Statistical analysis was performed using SPSS 22 (IBM Corporation, Armonk, NY, USA). Descriptive parameters were expressed as median and IQR. Relative differences between any dataset a and the reference dataset b were calculated as follows:
The significance of these differences was assessed with Wilcoxon signed-rank test for paired data. Proportions (%) of discordantly classified cases (high vs. low ASP/SUVmax/MTV) between algorithms were given with their 95% binomial proportion confidence intervals (95% CI), which included the continuity correction of ± 0.5/n (= ± 0.5/50 = ± 1%). Classification with ASP (> 19.5%) was based on a previously identified cutoff in NSCLC patients  while cutoffs for SUVmax (> 10.5) and MTV (> 9.5 ml) were the respective median among the current 50 patients. Proportions between different pairs of algorithms were compared with two-sided McNemar’s test. Correlation between ASP and MTV was examined using the Pearson correlation coefficient r and interpretation criteria based on . Statistical significance was generally assumed at p < 0.05.
To identify the level of reconstructed spatial resolution that provides minimal relative ASP difference to the reference algorithm (TOF4/16), different combinations of spatial resolution for candidate algorithms (TOF4/8, PSF + TOF2/17, Q.Clear) and the reference algorithm were compared pairwise (Table 2).
Relative ASP differences with TOF4/8 and PSF + TOF2/17 compared to TOF4/16 were significantly lower at 7 versus 7 mm than at 5 versus 7 mm, 9 versus 7 mm and 5 versus 5 mm (each p < 0.001). In contrast, differences with Q.Clear versus TOF4/16 at 7 versus 7 mm (median, 31.3%; IQR, 11.2 to 43.7%) were similar to 9 versus 7 mm (24.7%; 15.4 to 51.4%; p = 0.25). Relative ASP differences at 7 versus 7 mm were similar to 9 versus 9 mm with TOF4/8 (median, 7.6% vs. 9.3%; p = 0.38), PSF + TOF2/17 (12.8% vs. 16.2%; p = 0.25) and Q.Clear (31.3% vs. 29.1%; p = 0.33).
Relative SUVmax and MTV differences at 7 versus 7 mm were significantly lower than corresponding ASP differences (each p < 0.001; Table 2).
Proportions of discordantly classified cases (original data)
The proportion of discordantly classified cases (ASP > 19.5% vs. ASP ≤ 19.5%) with TOF4/8 compared to the reference algorithm at 7 versus 7 mm was 2% (95% CI 0–6.9%) and significantly lower than at 5 versus 7 mm or 9 versus 7 mm (38% and 16%, each p < 0.05; Table 3) but similar to 5 versus 5 mm and 9 versus 9 mm (6% and 2%, each p > 0.5).
Conversely, PSF + TOF2/17 showed significantly lower proportions at 7 versus 7 mm (4%; 95% CI 0–10.4%) compared to 5 versus 5 mm (32%, p = 0.001), while proportions were similar to 5 versus 7 mm, 9 versus 7 mm and 9 versus 9 mm (12%, 12% and 6%, each p > 0.1).
Q.Clear resulted in significantly lower proportions of discordant cases at 7 versus 7 mm (10%; 95% CI 0.7–19.3%) than at 9 versus 7 mm and 5 versus 5 mm (26% and 38%, each p < 0.01), while proportions were similar to 5 versus 7 mm and 9 versus 9 mm (10% and 12%, each p = 1.0).
Proportions at 7 versus 7 mm were comparable between TOF4/8 and PSF + TOF2/17 (2% vs. 4%; p = 1.0), while both algorithms showed slightly less discordant cases than Q.Clear (10%; each p > 0.1).
Proportions of discordant cases at 7 versus 7 mm were comparable between ASP, SUVmax and MTV with TOF4/8 (2% vs. 6% vs. 2%; each p > 0.5), PSF + TOF2/17 (4% vs. 0% vs. 4%; each p = 1.0) and Q.Clear (10% vs. 6% vs. 8%; each p = 1.0; Additional file 2: Table S1).
The number of discordantly classified cases tended to decrease when allowing a ± 5% tolerance range around the ASP cutoff value (i.e., low ASP, < 20.48%; high ASP, > 18.53%; Table 3).
Relative differences and discordant cases (retrospectively smoothed data)
Comparing data that were retrospectively smoothed to achieve 7-mm reconstructed spatial resolution with the original 7 mm data, relative differences between TOF4/8 and the reference algorithm TOF4/16 were higher in retrospectively smoothed data for ASP but similar for SUVmax and MTV (details in Table 4). In contrast, relative differences with PSF + TOF2/17 were comparable for ASP and significantly higher in the smoothed data for SUVmax and MTV. With Q.Clear, relative differences for ASP, SUVmax and MTV were each significantly lower in the smoothed data compared to original 7-mm data.
Proportions of discordantly classified cases at 7 versus 7 mm were comparable between retrospectively smoothed data and original 7 mm data for TOF4/8 (smoothed vs. original, 2% vs. 2%; p = 1.0), for PSF + TOF2/17 (4% vs. 4%; p = 1.0) and Q.Clear (6% vs. 10%; p = 0.5). The rate of discordant cases between retrospectively smoothed data and original 7-mm data for the reference algorithm TOF4/16 itself was 2% (95% CI 0–6.9%).
Relative differences and discordant cases (reduced acquisition time)
Relative differences in ASP, SUVmax and MTV at reconstructed spatial resolution of 7 mm (TOF4/8/6, TOF4/16/6.4, PSF + TOF2/17/7 and Q.Clear1750) and shorter acquisition times are displayed in Additional file 2: Tables S2 to S4. Independent from the acquisition time for the candidate algorithms, relative differences were always calculated with regard to the reference algorithm TOF4/16/6.4 at 180 s. Briefly, relative ASP, SUVmax and MTV differences with TOF4/8/6 and TOF4/16/6.4 were significantly higher at any shorter acquisition time (i.e., 120 s, 90 s and 60 s) than at 180 s. Relative differences with PSF2/17/7 tended to remain similar between 180 and 90 s but increased significantly at 60 s. Q.Clear1750 mostly showed similar ASP, SUVmax and MTV differences between all acquisition times.
Proportions of discordantly classified cases of ASP, SUVmax and MTV with TOF4/8/6, PSF + TOF2/17/7 and Q.Clear1750 did not increase significantly with shorter acquisition time (each compared to 180 s; Additional file 2: Tables S5 to S7). Discordant cases with TOF4/16/6.4 remained similar at 120 s and 90 s but increased with 60 s acquisition time (McNemar’s test not applicable).
Correlation of ASP and MTV
Correlation of ASP and MTV (Fig. 4) for the total patient sample was moderate for TOF4/16/2 (Pearson r = 0.54; p < 0.001) and moderate to high for TOF4/16/6.4 (Pearson r = 0.69; p < 0.001) and TOF4/16/9.5 (Pearson r = 0.71; p < 0.001).
The MTV threshold below which the correlation was negligible (i.e., r < 0.3) was highest for TOF4/16/2 (MTV ≤ 15 ml) and lowest for TOF4/16/6.4 (MTV ≤ 2.5 ml), while it was 5.0 ml for TOF4/16/9.5.
This study found that ASP differences between reconstruction algorithms were significantly higher than corresponding SUVmax and MTV differences (Table 2). This may be explained by a combined effect of changes in SUVmax (suppression of local maxima and therefore a decreasing absolute threshold and increasing MTV size) and changes in MTV surface (smoothed, smaller MTV surface) on the ASP. Coarseness of the MTV surface is likely to differ with variation in reconstructed spatial resolution, which—in conventional iterative reconstruction algorithms—is mainly determined by the width of the in-plane filter. Therefore, if threshold-based MTV delineation is applied, wider filters can be expected to result in lower ASP. In Bayesian-penalized likelihood reconstruction (e.g., GE’s Q.Clear), post-processing is not applied, and smoother images are generated by increasing the penalization factor β.
However, since ASP is supposed to serve as part of prognostic/predictive models based on a predefined cutoff, even substantial inter-method differences may be clinically irrelevant if classification of individual patients into groups of high versus low ASP remains concordant. Applying a strict cutoff for ASP of > 19.5% , discordantly classified cases compared to the reference algorithm accounted for 2% (TOF4/8) or 4% (PSF + TOF2/17) at spatial resolution of approx. 7-mm FWHM. This could be acknowledged as acceptably low for application of ASP in a multicenter study. If a less strict cutoff with ± 5% tolerance (ASP between 18.53% and 20.48%) was applied, no discordant cases at 7-mm FWHM were observed for TOF4/8 and PSF + TOF2/17. This underlines that inter-method ASP differences at comparable spatial resolution are clinically relevant only if ASP is close to the predefined cutoff. Furthermore, this range of tolerance is well covered by the range of possible ASP cutoffs (17% to 39%) within which ASP remained significantly prognostic for PFS in previously reported patients with UICC stage II NSCLC .
Relative differences and discordant proportions tended to be higher with Q.Clear. Notably, Q.Clear showed systematically lower image noise at any level of spatial resolution (Table 1 and Fig. 2). In contrast to conventional algorithms, relative ASP differences with Q.Clear compared to the reference algorithm were higher at 7 versus 7 mm than at 5 versus 7 mm (Table 2) or at 7 versus 9 mm (Additional file 2: Table S8). Simultaneously, noise levels at 5 versus 7 mm and 7 versus 9 mm were also more comparable to the reference algorithm than at 7 versus 7 mm. However, the same observation was not true for SUVmax and MTV or with the conventional algorithms. Consequently, similar reconstructed spatial resolution rather than the noise level should guide the choice of reconstruction algorithms for harmonization for multicenter purposes. Furthermore, Q.Clear, or Bayesian-penalized likelihood reconstruction in general, may not be optimal to achieve minimal ASP deviations if the reference is a conventional algorithm.
With the PET scanner used in the present study, variation of image noise between algorithms was especially prominent at spatial resolution of 5-mm FWHM (Table 1, Fig. 1). This partly explains high inter-method differences, which exceeded 100% for TOF4/8 and TOF4/16 (Table 2), and frequent discordant cases even if pairs of algorithms with 5 versus 5 mm FWHM were compared. In addition to higher noise, Gibbs artifacts (edge elevations) caused by PSF + TOF and Q.Clear reconstruction increase with narrower in-plane filters or lower β . Consequently, SUVmax differences will be more prominent than at 7 mm or 9 mm FWHM. In contrast, in substantially smoothed data with 9-mm FWHM, PET parameters that are reflective of heterogeneity or irregularity of tracer accumulation, such as ASP may lose discriminatory power to detect “real” and clinically relevant differences between tumors/patients. Therefore, under the conditions of the current analysis, 7-mm FWHM could be a feasible and reasonable target for harmonization in a multicenter approach. This is underlined by the observation that the MTV threshold for correlation between ASP and MTV was lowest for TOF4/16/6.4 compared to TOF4/16/9.5 and especially TOF4/16/2.
If reconstructed spatial resolution is better than the target resolution (e.g., 5 mm instead of 7-mm FWHM), retrospective smoothing of data using formula (1) can be performed to achieve the anticipated resolution. This enabled inter-method differences and discordant proportions far closer to those observed with the original 7-mm data, irrespective of TOF, PSF + TOF or Q.Clear. Consequently, in a multicenter analysis, retrospective smoothing of data with better spatial resolution would be a valid option to ensure comparability. It is important to note that here the effective reconstructed spatial resolution is relevant , which can differ notably from the resolution determined via point sources.
A similar approach by the EANM Research Ltd. (EARL) harmonization project was reported by Kaalep et al. who analyzed SUV and MTV in FDG-PET data of NSCLC and lymphoma patients. Only after applying an additional Gaussian post-reconstruction filter of 6- to 7-mm FWHM to PET data reconstructed with PSF + TOF (compliant with the current EARL 2 standard) could SUV and MTV differences be reduced from approx. 30% to < 10% compared to reconstruction compliant with the former EARL 1 standard . In a different approach to harmonization, Tsutsui et al. examined OSEM + TOF data of a NEMA IEC phantom obtained with a Siemens Biograph mCT and showed that errors compared to a simulated reference phantom were lowest with an in-plane filter of approx. 7- to 8-mm FWHM . In a different study, the group achieved harmonization between 12 different PET scanners using contrast recovery (CR) of NEMA IEC phantom spheres by applying a scanner-specific Gaussian filter of up to 8-mm FWHM . The current results of low SUVmax differences < 5% and MTV differences ≤ 6% at 7 versus 7 mm FWHM imply that both CR and reconstructed spatial resolution may be suitable surrogates for harmonization.
Shorter acquisition times of 120 s, 90 s or 60 s increased inter-method differences compared to 180 s with TOF4/8/6 and TOF4/16/6.4, while the increase was insignificant or less prominent with PSF + TOF2/17/7 and Q.Clear1750. More importantly, proportions of discordantly classified cases by ASP, SUVmax or MTV remained similar or did not increase significantly—especially between 180 and 90 s. Therefore, equal acquisition times between PET systems/centers may be of secondary importance to achieve comparability in the investigated parameters, and differences as high as 180 s versus 90 s might be tolerable.
Voxel sizes may also vary between PET systems in a multicenter study. However, due to technical restrictions voxel size could not be freely varied during image reconstruction in this study. Therefore, the influence on ASP, SUVmax and MTV and the correcting effect of retrospective reslicing to the original voxel size could not be assessed. A further limitation of the current analysis is that the variation in reconstruction algorithms and acquisition time may not fully reflect differences between PET scanners beyond these factors. This would require comparative examinations with different scanners in each patient under identical conditions [20, 44]. For methodological consistency with the previous studies [13,14,15], the same threshold-based algorithm  was used to delineate all lesions. Consequently, the presented results are not necessarily valid when lesions are delineated differently. Furthermore, although the current study demonstrated that the reconstructed spatial resolution can be used as a surrogate for scanner harmonization and showed lowest inter-method ASP differences and the lowest MTV threshold for correlation between ASP and MTV for 7.0 FWHM, this is not sufficient for a general recommendation of this specific spatial resolution for future studies regarding the ASP. This decision should also consider the performance of all PET scanners used in a specific study (best achievable reconstructed spatial resolution) and—if available—comparative clinical results on the value of ASP at different reconstructed spatial resolution.
Differences in ASP, SUVmax and MTV resulting from TOF4/8, PSF + TOF2/17 or Q.Clear compared to the reference algorithm TOF4/16 were mainly determined by differences in reconstructed spatial resolution. Therefore, harmonization for ASP in multicenter studies should aim at comparable reconstructed spatial resolution between PET systems, which is determined by either in-plane filter width or the penalization factor β. With the PET scanner used in the present study, a resolution of 7-mm FWHM ensured that discordantly classified cases of high versus low ASP were at an acceptable proportion for TOF and PSF + TOF of < 5% (Q.Clear: 10%). Retrospectively smoothing data with better spatial resolution (i.e., lower FWHM) to the desired FWHM resulted in comparable results. These results require confirmation in a multicenter study.
Availability of data and materials
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
- 95% CI:
95% Confidence interval
European Association of Nuclear Medicine
EANM Research Ltd.
Full width at half maximum
Image Biomarker Standardization Initiative
International Electrotechnical Commission
Metabolic tumor volume
National Electrical Manufacturers Association
Non-small cell lung cancer
Ordered subset expectation maximization
Positron emission tomography/computed tomography
Point spread function
Standardized uptake value
Time of flight
Volume of interest
Postmus PE, Kerr KM, Oudkerk M, Senan S, Waller DA, Vansteenkiste J, et al. Early and locally advanced non-small-cell lung cancer (NSCLC): ESMO Clinical Practice Guidelines for diagnosis, treatment and follow-up. Ann Oncol. 2017;28(suppl_4):iv1–21.
Arriagada R, Bergman B, Dunant A, Le Chevalier T, Pignon JP, Vansteenkiste J, et al. Cisplatin-based adjuvant chemotherapy in patients with completely resected non-small-cell lung cancer. N Engl J Med. 2004;350(4):351–60.
Douillard JY, Rosell R, De Lena M, Carpagnano F, Ramlau R, Gonzales-Larriba JL, et al. Adjuvant vinorelbine plus cisplatin versus observation in patients with completely resected stage IB-IIIA non-small-cell lung cancer (Adjuvant Navelbine International Trialist Association [ANITA]): a randomised controlled trial. Lancet Oncol. 2006;7(9):719–27.
Artal Cortes A, Calera Urquizu L, Hernando CJ. Adjuvant chemotherapy in non-small cell lung cancer: state-of-the-art. Transl Lung Cancer Res. 2015;4(2):191–7.
Sharpnack MF, Ranbaduge N, Srivastava A, Cerciello F, Codreanu SG, Liebler DC, et al. Proteogenomic analysis of surgically resected lung adenocarcinoma. J Thorac Oncol. 2018;13(10):1519–29.
Wang L, Dong T, Xin B, Xu C, Guo M, Zhang H, et al. Integrative nomogram of CT imaging, clinical, and hematological features for survival prediction of patients with locally advanced non-small cell lung cancer. Eur Radiol. 2019;29(6):2958–67.
Desseroit MC, Visvikis D, Tixier F, Majdoub M, Perdrisot R, Guillevin R, et al. Development of a nomogram combining clinical staging with (18)F-FDG PET/CT image features in non-small-cell lung cancer stage I-III. Eur J Nucl Med Mol Imaging. 2016;43(8):1477–85.
Liu J, Dong M, Sun X, Li W, Xing L, Yu J. Prognostic value of 18F-FDG PET/CT in surgical non-small cell lung cancer: a meta-analysis. PLoS ONE. 2016;11(1):e0146195.
Paesmans M, Garcia C, Wong CY, Patz EF Jr, Komaki R, Eschmann S, et al. Primary tumour standardised uptake value is prognostic in nonsmall cell lung cancer: a multivariate pooled analysis of individual data. Eur Respir J. 2015;46(6):1751–61.
Vanhove K, Mesotten L, Heylen M, Derwael R, Louis E, Adriaensens P, et al. Prognostic value of total lesion glycolysis and metabolic active tumor volume in non-small cell lung cancer. Cancer Treat Res Commun. 2018;15:7–12.
Park S, Ha S, Lee SH, Paeng JC, Keam B, Kim TM, et al. Intratumoral heterogeneity characterized by pretreatment PET in non-small cell lung cancer patients predicts progression-free survival on EGFR tyrosine kinase inhibitor. PLoS ONE. 2018;13(1):e0189766.
Arshad MA, Thornton A, Lu H, Tam H, Wallitt K, Rodgers N, et al. Discovery of pre-therapy 2-deoxy-2-(18)F-fluoro-D-glucose positron emission tomography-based radiomics classifiers of survival outcome in non-small-cell lung cancer patients. Eur J Nucl Med Mol Imaging. 2019;46(2):455–66.
Apostolova I, Ego K, Steffen IG, Buchert R, Wertzel H, Achenbach HJ, et al. The asphericity of the metabolic tumour volume in NSCLC: correlation with histopathology and molecular markers. Eur J Nucl Med Mol Imaging. 2016;43(13):2360–73.
Apostolova I, Rogasch J, Buchert R, Wertzel H, Achenbach HJ, Schreiber J, et al. Quantitative assessment of the asphericity of pretherapeutic FDG uptake as an independent predictor of outcome in NSCLC. BMC Cancer. 2014;14:896.
Rogasch JMM, Furth C, Chibolela C, Hofheinz F, Ochsenreither S, Ruckert JC, et al. Validation of independent prognostic value of asphericity of (18)F-fluorodeoxyglucose uptake in non-small-cell lung cancer patients undergoing treatment with curative intent. Clin Lung Cancer. 2019;21:264–72.
Sharma A, Mohan A, Bhalla AS, Sharma MC, Vishnubhatla S, Das CJ, et al. Role of various metabolic parameters derived from baseline 18F-FDG PET/CT as prognostic markers in non-small cell lung cancer patients undergoing platinum-based chemotherapy. Clin Nucl Med. 2018;43(1):e8–17.
Ma W, Wang M, Li X, Huang H, Zhu Y, Song X, et al. Quantitative (18)F-FDG PET analysis in survival rate prediction of patients with non-small cell lung cancer. Oncol Lett. 2018;16(4):4129–36.
Houdu B, Lasnon C, Licaj I, Thomas G, Do P, Guizard AV, et al. Why harmonization is needed when using FDG PET/CT as a prognosticator: demonstration with EARL-compliant SUV as an independent prognostic factor in lung cancer. Eur J Nucl Med Mol Imaging. 2019;46(2):421–8.
Lasnon C, Enilorac B, Popotte H, Aide N. Impact of the EARL harmonization program on automatic delineation of metabolic active tumour volumes (MATVs). EJNMMI Res. 2017;7(1):30.
Zhuang M, Garcia DV, Kramer GM, Frings V, Smit EF, Dierckx R, et al. Variability and repeatability of quantitative uptake metrics in (18)F-FDG PET/CT of non-small cell lung cancer: impact of segmentation method, uptake interval, and reconstruction protocol. J Nucl Med. 2019;60(5):600–7.
Akamatsu G, Mitsumoto K, Taniguchi T, Tsutsui Y, Baba S, Sasaki M. Influences of point-spread function and time-of-flight reconstructions on standardized uptake value of lymph node metastases in FDG-PET. Eur J Radiol. 2014;83(1):226–30.
Armstrong IS, Kelly MD, Williams HA, Matthews JC. Impact of point spread function modelling and time of flight on FDG uptake measurements in lung lesions using alternative filtering strategies. EJNMMI Phys. 2014;1(1):99.
Fleckenstein J, Hellwig D, Kremp S, Grgic A, Groschel A, Kirsch CM, et al. F-18-FDG-PET confined radiotherapy of locally advanced NSCLC with concomitant chemotherapy: results of the PET-PLAN pilot trial. Int J Radiat Oncol Biol Phys. 2011;81(4):e283–9.
Dewalle-Vignion AS, Yeni N, Petyt G, Verscheure L, Huglo D, Beron A, et al. Evaluation of PET volume segmentation methods: comparisons with expert manual delineations. Nucl Med Commun. 2012;33(1):34–42.
Nestle U, Kremp S, Schaefer-Schuler A, Sebastian-Welsch C, Hellwig D, Rube C, et al. Comparison of different methods for delineation of 18F-FDG PET-positive tissue for target volume definition in radiotherapy of patients with non-Small cell lung cancer. J Nucl Med. 2005;46(8):1342–8.
Zwanenburg A, Vallières M, Abdalah MA, Aerts HJWL, Andrearczyk V, Apte A, et al. The image biomarker standardization initiative: standardized quantitative radiomics for high-throughput image-based phenotyping. Radiology. 2020;295(2):328–38.
Vandendriessche D, Uribe J, Bertin H, De Geeter F. Performance characteristics of silicon photomultiplier based 15-cm AFOV TOF PET/CT. EJNMMI Phys. 2019;6(1):8.
Hofheinz F, Dittrich S, Potzsch C, Hoff J. Effects of cold sphere walls in PET phantom measurements on the volume reproducing threshold. Phys Med Biol. 2010;55(4):1099–113.
Hofheinz F, Langner J, Petr J, Beuthien-Baumann B, Steinbach J, Kotzerke J, et al. An automatic method for accurate volume delineation of heterogeneous tumors in PET. Med Phys. 2013;40(8):082503.
Apostolova I, Steffen IG, Wedel F, Lougovski A, Marnitz S, Derlin T, et al. Asphericity of pretherapeutic tumour FDG uptake provides independent prognostic value in head-and-neck cancer. Eur Radiol. 2014;24(9):2077–87.
Wetz C, Apostolova I, Steffen IG, Hofheinz F, Furth C, Kupitz D, et al. Predictive value of asphericity in pretherapeutic [(111)In]DTPA-octreotide SPECT/CT for response to peptide receptor radionuclide therapy with [(177)Lu]DOTATATE. Mol Imag Biol. 2017;19(3):437–45.
Wetz C, Genseke P, Apostolova I, Furth C, Ghazzawi S, Rogasch JMM, et al. The association of intra-therapeutic heterogeneity of somatostatin receptor expression with morphological treatment response in patients undergoing PRRT with [177Lu]-DOTATATE. PLoS ONE. 2019;14(5):e0216781.
Rogasch JMM, Hundsdoerfer P, Hofheinz F, Wedel F, Schatka I, Amthauer H, et al. Pretherapeutic FDG-PET total metabolic tumor volume predicts response to induction therapy in pediatric Hodgkin’s lymphoma. BMC Cancer. 2018;18(1):521.
Meißner S, Janssen JC, Prasad V, Brenner W, Diederichs G, Hamm B, et al. Potential of asphericity as a novel diagnostic parameter in the evaluation of patients with (68)Ga-PSMA-HBED-CC PET-positive prostate cancer lesions. EJNMMI Res. 2017;7(1):85.
Hofheinz F, Lougovski A, Zöphel K, Hentschel M, Steffen IG, Apostolova I, et al. Increased evidence for the prognostic value of primary tumor asphericity in pretherapeutic FDG PET for risk stratification in patients with head and neck cancer. Eur J Nucl Med Mol Imaging. 2015;42(3):429–37.
Rogasch JMM, Hundsdoerfer P, Furth C, Wedel F, Hofheinz F, Krüger PC, et al. Individualized risk assessment in neuroblastoma: does the tumoral metabolic activity on (123)I-MIBG SPECT predict the outcome? Eur J Nucl Med Mol Imaging. 2017;44(13):2203–12.
Zschaeck S, Li Y, Lin Q, Beck M, Amthauer H, Bauersachs L, et al. Prognostic value of baseline [18F]-fluorodeoxyglucose positron emission tomography parameters MTV, TLG and asphericity in an international multicenter cohort of nasopharyngeal carcinoma patients. PLoS ONE. 2020;15(7):e0236841.
Mullikin JC, Verbeek PW. Surface area estimation of digitized planes. Bioimaging. 1993;1(1):6–16.
Mukaka MM. Statistics corner: A guide to appropriate use of correlation coefficient in medical research. Malawi Med J. 2012;24(3):69–71.
Rogasch JM, Suleiman S, Hofheinz F, Bluemel S, Lukas M, Amthauer H, et al. Reconstructed spatial resolution and contrast recovery with Bayesian penalized likelihood reconstruction (Q.Clear) for FDG-PET compared to time-of-flight (TOF) with point spread function (PSF). EJNMMI Phys. 2020;7(1):2.
Kaalep A, Burggraaff CN, Pieplenbosch S, Verwer EE, Sera T, Zijlstra J, et al. Quantitative implications of the updated EARL 2019 PET-CT performance standards. EJNMMI Phys. 2019;6(1):28.
Tsutsui Y, Awamoto S, Himuro K, Umezu Y, Baba S, Sasaki M. Characteristics of smoothing filters to achieve the guideline recommended positron emission tomography image without harmonization. Asia Ocean J Nucl Med Biol. 2018;6(1):15–23.
Tsutsui Y, Daisaki H, Akamatsu G, Umeda T, Ogawa M, Kajiwara H, et al. Multicentre analysis of PET SUV using vendor-neutral software: the Japanese Harmonization Technology (J-Hart) study. EJNMMI Res. 2018;8(1):83.
Kramer GM, Frings V, Hoetjes N, Hoekstra OS, Smit EF, de Langen AJ, et al. Repeatability of quantitative whole-body 18F-FDG PET/CT uptake measures as function of uptake interval and lesion selection in non-small cell lung cancer patients. J Nucl Med. 2016;57(9):1343–9.
Open Access funding enabled and organized by Projekt DEAL.
Ethics approval and consent to participate
All procedures were in accordance with the Charité ethics commission (vote, EA4/163/18), and informed consent was obtained from all individual participants included in the study.
Consent for publication
Written informed consent of the patient presented in Fig. 2 for publication was obtained.
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
. IBSI Checklist version 1.0 (October 2019; see reference below).
. Discordant cases relative to the reference algorithm (SUVmax and MTV). Table S2. Relative differences to the reference algorithm: Acquisition times (ASP). Table S3. Relative differences to the reference algorithm: Acquisition times (SUVmax). Table S4. Relative differences to the reference algorithm: Acquisition times (MTV). Table S5. Discordant cases relative to the reference algorithm: Acquisition times (ASP). Table S6. Discordant cases relative to the reference algorithm: Acquisition times (SUVmax). Table S7. Discordant cases relative to the reference algorithm: Acquisition times (MTV). Table S8. Relative differences and discordant cases relative to the reference algorithm (7 vs. 9 mm FWHM).
About this article
Cite this article
Rogasch, J.M.M., Furth, C., Bluemel, S. et al. Asphericity of tumor FDG uptake in non-small cell lung cancer: reproducibility and implications for harmonization in multicenter studies. EJNMMI Res 10, 134 (2020). https://doi.org/10.1186/s13550-020-00725-y
- Image reconstruction
- Spatial resolution
- Non-small cell lung cancer