Respiratory motion correction in F-18-FDG PET/CT impacts lymph node assessment in lung cancer patients

Backgrounds Elastic motion correction in PET has been shown to increase image quality and quantitative measurements of PET datasets affected by respiratory motion. However, little is known on the impact of respiratory motion correction on clinical image evaluation in oncologic PET. This study evaluated the impact of motion correction on expert readers’ lymph node assessment of lung cancer patients. Methods Forty-three patients undergoing F-18-FDG PET/CT for the staging of suspected lung cancer were included. Three different PET reconstructions were investigated: non-motion-corrected (“static”), belt gating-based motion-corrected (“BG-MC”) and data-driven gating-based motion-corrected (“DDG-MC”). Assessment was conducted independently by two nuclear medicine specialists blinded to the reconstruction method on a six-point scale \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s$$\end{document} ranging from “certainly negative” (1) to “certainly positive” (6). Differences in \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s$$\end{document} between reconstruction methods, accounting for variation caused by readers, were assessed by nonparametric regression analysis of longitudinal data. From \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s$$\end{document}, a dichotomous score for N1, N2, and N3 (“negative,” “positive”) and a subjective certainty score were derived. SUV and metabolic tumor volumes (MTV) were compared between reconstruction methods. Results BG-MC resulted in higher scores for N1 compared to static (p = 0.001), whereas DDG-MC resulted in higher scores for N2 compared to static (p = 0.016). Motion correction resulted in the migration of N1 from tumor free to metastatic on the dichotomized score, consensually for both readers, in 3/43 cases and in 2 cases for N2. SUV was significantly higher for motion-corrected PET, while MTV was significantly lower (all p < 0.003). No significant differences in the certainty scores were noted. Conclusions PET motion correction resulted in significantly higher lymph node assessment scores of expert readers. Significant effects on quantitative PET parameters were seen; however, subjective reader certainty was not improved.


Introduction
Lung cancer is one of the most common cancers and the leading cause of cancer-related deaths worldwide [1]. F-18-FDG PET/CT is implemented in the initial staging of lung cancer patients, especially for the assessment of lymph node involvement and exclusion of distant metastases [2,3]. Moreover, its use is recommended for the assessment of suspicious pulmonary nodules [4]. Sensitivity of F-18-FDG PET/CT is high for distinguishing malignant from benign solitary pulmonary nodules; however, it demonstrated low specificity [5]. Additionally, sensitivity of F-18-FDG PET/CT is limited in the evaluation of lymph nodes [6,7]. Thus decisions on management in lung cancer patients should not be based on F-18-FDG PET/CT alone, and improvements in lymph node assessment are warranted [6].
Respiratory motion is a well-known source of image artifacts and erroneous quantification in thoracic and abdominal PET, resulting in decreased apparent tracer uptake quantification, increased MTV, and losses in effective spatial resolution [8][9][10]. To overcome this, a wide range of motion correction algorithms for PET have been introduced and investigated during the last two decades, with the most practical and robust ones now becoming established in clinical scans (albeit at a slow rate). Historically, the proposed methods range from comparatively simple approaches avoiding respiratory motion effects by prolonged scanning of a defined respiratory phase (most often end-expiration) [11] to more advanced solutions comprising gated reconstructions where an additionally acquired signal representing the respiratory phase of a patient during the scan is used to reconstruct only coincidence events emitted during a specified respiratory state [12,13]. An important subset of the latter methods, software or data-driven gating (DDG) is based on analyzing measured PET raw data to calculate breathing signals instead of using additional hardware to record these signals, thus potentially simplifying clinical scans and increasing patient comfort [14][15][16][17]. Finally, fully motion-corrected reconstructions have been recently introduced by taking all measured PET data into account, rather than just a subset determined by a specified gating approach [18,19].
Clinical studies already demonstrated that gated or motion-corrected PET reconstructions typically resulted in higher tracer uptake values, smaller lesion volumes and subjectively "sharper" images [10,20,21]. Few studies have investigated the role of PET-derived gating on diagnostic accuracy for the detection and characterization of suspicious solitary pulmonary nodules [22,23]. However, besides these basic, directly image-derived parameters and first clinical applications, not much is known about the impact of motion-corrected PET on staging and clinical decisions making. First results of a multi-tracer study indicate that DDG might result in changes in clinical PET reports and might even change further clinical management in many different types of cancer [24]. The authors strongly encourage dedicated future studies in different disease settings [24].
We therefore opted to investigate the impact of fully motion-corrected PET reconstructions, based on both hardware-and software-derived gating, in F-18-FDG PET/CT staging scans of lung cancer patients. In particular, we were interested in the subjective differences in lymph node assessment of expert readers using nonmotion-corrected PET and motion-corrected PET, respectively.

Patient data
In this retrospective analysis datasets of 43 patients who underwent initial F-18-FDG PET/CT for staging of suspected lung cancer at our facility between December 2018 and December 2020 were included. Patients with prior resection of the primary tumor were excluded. The study design was approved by the local ethics committee of the University of Münster (AZ 2019-024-f-S, 2021-172-f-S), and was performed in accordance with the 1964 Helsinki declaration and its later amendments. The need for written informed consent was waived due to the retrospective nature of the study.

PET/CT scans
The patients fasted overnight before the PET/CT scan. They received 3 MBq/kg body mass of F-18-FDG i.v. approximately one hour prior to the scan which was performed on a Biograph mCT (Siemens Healthcare GmbH, Erlangen, Germany) capable of time-of-flight and continuous bed motion (axial PET field-of-view, 21.8 cm; spatial resolution at center, 4 mm full width at half maximum; sinogram sizes, 400 × 168; time-of-flight bins, 13) [25]. Patients were scanned in a supine position with the arms above the head. During the examination, the respiratory gating system AZ-733 V (Anzai Co., Tokyo, Japan) recorded respiratory signals for subsequent gating (belt gating, BG) and motion correction.
Scanning ranges were from the head or neck down to the proximal femur. End-expiratory low-dose CT scans were performed (tube voltage, 120 kV; effective current, 18 mAs; slice thickness, 3.0 mm; duration, 10-20 s) followed by PET in continuous bed motion (free breathing; speed, 1.1 mm/s; duration, 500-900 s).
The applied DDG algorithm is based on a spectral analysis of continuous bed motion PET raw data and is described in detail elsewhere [17,21]. Briefly, it divides the raw data into axial regions of 80 mm length, where measured events are back-projected into the most likely origin voxel according to their time-of-flight bin. The predominant respiratory frequency was then identified by the maximum in the power spectrum of the standard deviation along the anterior-posterior axis over time. Voxels that demonstrated fluctuations close to this frequency were then used to define a mask of regions affected by respiration. Respiratory signals for each axial region were then calculated by phase-and mask-weighted summation of voxel time-activity curves and finally concatenated and normalized to give an overall DDG signal for the whole PET scan.
Signals from both sources were used for elastic motioncorrected PET reconstructions by first reconstructing the "optimal gate" comprising coincidence data from the narrowest signal amplitude interval covering 35% of the total data, giving a good compromise between motion resolution and data statistics, and then using mass-preserving optical flow techniques to determine a motion vector field between the gated and a static reconstruction. This vector field was then finally used in an effective deblurring step within a motion-corrected image reconstruction [18,19], resulting in BG-MC and DDG-MC datasets.
All reconstructions were based on an ordinary Poisson ordered subset expectation maximization (2 iterations, 21 subsets, 2 mm full width at half maximum Gaussian post-reconstruction filter, 400 × 400 image matrix, Fig. 1 Reconstruction workflow used for the three PET images ("static, " "BG-MC" and "DDG-MC") performed within this study 2.04 × 2.04 × 2.03 mm 3 voxel volume; e7 toolbox, Siemens Healthcare GmbH, Erlangen, Germany) with pointspread-function and time-of-flight data, normalization, and random correction; attenuation and scatter correction were based on the measured CT data. Overall, three PET and one CT image dataset per patient were thus subsequently analyzed.

Image Assessment
All PET and CT images were anonymized and sent to a syngo.via workstation (Oncology tool, Siemens Healthcare GmbH, Erlangen, Germany) where they were presented independently to two nuclear medicine specialists (BN, WR) with more than five years of experience in PET/CT imaging. One of the three PET reconstructions, the CT image and a fused PET-CT image were made available to a reader. The three different PET reconstructions (static, BG-MC, DDG-MC) for any given scan were presented in random order and in different sessions in an interval of at least 2 weeks to reduce bias. The readers were blinded for the actual type of reconstruction.
The lymph node (N) and distant metastasis (M) status was assessed, with the N rating further divided into the three different lymph node regions N1 (ipsilateral peribronchial and/or hilar lymph nodes), N2 (ipsilateral mediastinal and/or subcarinal lymph nodes), and N3 (contralateral mediastinal and/or hilar, as well as any supraclavicular lymph nodes), following the TNM staging system for lung cancer of the American Joint Commission of Cancer (AJCC) and the Union Internationale Contre la Cancer (UICC) [26]. For every reconstruction, these three N regions and the M status were independently rated on an ordinal scale s ranging from 1 ("certainly negative"), 2 ("probably negative"), 3 ("doubtfully negative"), 4 ("doubtfully positive"), 5 ("probably positive"), to 6 ("certainly positive"). Derived from this score, a simplified dichotomous score d was defined as 0 for negative findings (scale values of 1, 2, 3) and 1 for positive findings (scale values of 4, 5, 6).
Finally, to quantify the subjective certainty of the readers, an ordinal certainty score was calculated as with 1 denoting least certainty and 3 denoting highest certainty.
Additionally, the primary tumor and the most prominent lymph nodes visible in each region N1, N2 and N3 were characterized by their standardized uptake values SUV max , and SUV mean , and the metabolic tumor volume (MTV) in each reconstruction. c = |3.5 − s| + 0.5

Statistical analysis
Analyses were performed using R statistical software version 3.6.1 (The R Foundation, r-project.org). All reported p values are two-sided. Normally distributed data were described using mean and standard deviation. Non-normally distributed data were described using median and interquartile range. Normality was assessed by analysis of histograms and skewness statistics.
Interobserver agreement for TNM staging using the ordinal scale s was assessed using Cohen's weighted kappa statistics. In the primary statistical analysis differences in the ordinal score values s between reconstruction methods were assessed for each region by nonparametric analysis of longitudinal data in factorial experiments using the R package nparLD [27], as were differences in the certainty score c . The method accounts for dependencies between measurements on the same patient (i.e., for a given region each patient provides a measurement per reconstruction method and reader, resulting in six observations per patient). A multiple comparison procedure based on the closed testing principle [28] was applied to each region using a (multiple) significance level of 0.05 per region. Following this principle, a single pairwise comparison was considered significant, if both the overall comparison and the pairwise comparison resulted in a p value ≤ 0.05.
SUV and MTV showed a non-normal distribution in histograms analysis. Differences in SUV and volumes between methods were assessed in an exploratory analysis using Friedman's test. Wilcoxon signed-rank tests were applied as post hoc procedure. p values ≤ 0.05 were considered significant.

Patient characteristics
Forty-three patients with a median age of 70 years (15 women, 28 men) were included in this retrospective analysis. For further patients' characteristics, see Table 1.

Interreader agreement
Interreader agreement for score s was excellent for all locations and image reconstructions, according to the magnitude guidelines as suggested by Landis and Koch [29], with weighted kappa values ranging from 0.88 to 0.96 ( Table 2).

Influence of motion correction on assessment of lymph nodes and distant metastases
The mean scores s for reader 1 Table 3. Differences in scoring between image reconstruction methods are visualized in Fig. 2.
Analyzing the data of both readers revealed statistically notable differences in score s between the reconstruction methods for lymph node regions N1 and N2 (p = 0.004 and p = 0.036, Table 3). For N1, BG-MC images showed a significantly higher score compared to static and DDG-MC images (p = 0.001 and 0.026), whereas no notable difference was evident between static and DDG-MC images (p = 0.122). For N2, DDG-MC images showed a significantly higher score compared to static and BG-MC (p = 0.016 and 0.042), whereas no notable difference was evident between static and BG-MC images (p = 0.676) ( Table 3).
For the dichotomized score d , there were several cases where motion correction with either BG or DDG resulted in uprating consensually for both readers. However, there was no case in which both readers rated down any station in motion-corrected images compared to static images. Compared to static images there where three cases where both readers rated up N1 from tumor-free to metastatic (Table 4). For DDG-MC there were two cases where both readers rated up N1 and one case where both readers rated up station N2 (Table 4).
Correlative histopathological results from multisegmental EBUS-TBNA were available for one patient in whom both BG-and DDG-based motion correction resulted in uprating of N1 from tumor free to metastatic and DDG-based motion correction resulted in uprating of N2 from tumor free to metastatic. EBUS-TBNA results confirmed metastasis in ipsilateral and contralateral lymph nodes (Fig. 3).

Influence of motion correction on certainty scores
No notable differences in the certainty scores c were found between the reconstruction methods (Table 5 and Fig. 4).

Influence of motion correction on SUV and metabolic tumor volume
Histogram analysis revealed non-normal distributions for SUV and MTV values (p < 0.05 in Shapiro-Wilk tests). Differences were evident between image reconstruction methods for SUV max , SUV mean and MTV for all lymph node regions and for the primary tumor (all p values for Friedman's test < 0.001, Table 6). Post hoc testing demonstrated significantly higher SUV max and SUV mean and smaller MTV for BG-MC and DDG-MC images compared to static images for all locations (all p values < 0.003). No significant differences for SUV or MTV were found between BG-MC and DDG-MC.

Discussion
State-of-the-art staging of lung cancer patients often includes initial staging with F-18-FDG PET/CT, especially for the assessment of lymph nodes and distant metastases following the updated 8th edition of TNM classification [26]. Clinically available hardware-based gating (in our case, belt-based gating) and DDG are promising methods to overcome PET inherent disadvantages in the assessment of lesions affected by respiratory motion [10,17]. Besides the well-known advantages of motion-corrected PET, i.e., higher, more accurate tracer uptake values and subjectively "sharper" images, studies on the impact of motion-corrected PET on staging and value in clinical decision-making are still sparse [10,21,24,30,31]. This study therefore sought to evaluate the impact of two different methods of fully motion-corrected PET reconstructions compared to standard static (non-motion-corrected) PET on lung cancer staging scans. In line with previous studies, semi-quantitative PET uptake values SUV max and SUV mean were significantly higher in primary tumor and metastatic lesions in our study (Table 6) when applying motion correction [21,24,31]. SUV was not significantly different for BG-MC and DDG-MC in the presented study in line with previously published results based on the same methodology [21]. Contrary to these results, Walker et al. reported only slightly but significantly higher SUV for DDG compared to external device-based gating in 144 patients; however, both of their gating methods are different than the ones employed by us [32]. More specifically, their applied hardware-based gating method relies on camera tracking of body surface markers, and their DDG algorithm uses principal component analysis rather than spectral Fourier analysis as in our case. Furthermore, a different patient collective was analyzed, making a direct comparison between their results and ours difficult. However, they mention that their camera-based gating approach relied on a prospective trigger insertion algorithm into the list mode stream rather than a retrospective one they used for DDG. This might explain a perceived superiority of their DDG, while in our case both gating approaches relied on a retrospective analysis of the acquired waveforms, thus explaining very similar SUV for both motioncorrected PET images.
In line with previously published results, MTV was significantly smaller when applying gating methods compared to static PET [30,33,34]. This is of utmost   importance for target volume delineation in radiotherapy planning, not only limited to lung cancer treatment, although the clinical impact of these changes still warrants further investigation [35]. We theorized that the effect of PET motion correction, i.e., increasing SUV while decreasing lesion volumes at the same time, could result in human readers perceiving lesions as showing focal tracer uptake compatible with malignant lesions which would have been rated as benign or even overlooked on static images (Fig. 3). Going beyond most previous studies, our study could indeed demonstrate that motion-corrected PET does not only result in higher SUV and smaller MTV but may also impact staging decision by human readers, even if only in a limited amount of cases. On average, motion correction with BG-MC and DDG-MC made readers assign significantly higher scores compared to static images for lymph nodes in N1 and N2 but not in N3. Therefore, the readers were more likely to classify lymph nodes in N1 (for BG-MG) and N2 (for DDG-MC) as metastatic compared to static images. The reason why classification of N1 and N2 but not N3 and M1 are affected by PET motion correction might be related to the fact that lymph nodes in N1 and N2 are more affected by respiratory motion than those in N3 which can have a larger distance to the diaphragm, e.g., in the case of cervical lymph node metastases. Moreover, M1 does not only include patients with a single metastasis potentially affected by respiratory motion as in the adrenal gland or the liver, but also patients with (additional) multiple bone metastases not or barely affected by respiratory motion.
On average, the certainty score c of the readers was not different between the reconstruction types. We believe this is connected to the observed shift in s to higher values over the whole range of possible outcomes; thus, cases that were ambiguous without motion correction had the tendency to be perceived as metastases with motion correction, while motion correction may also lead to lymph nodes being classified as potential metastases that were deemed unsuspicious without motion correction.
Following the application of motion correction, uprating from disease free to metastatic on the dichotomous score occurred, consensually for both readers, in 3/43 (7%) patients in N1 using BG-MC and in 2/43 (5%) patients using DDG-MC. For N2, consensual upstaging occurred in 1/43 patient with DDG-MC (2%). Correlative histopathological results from multisegmental EBUS-TBNA were available for one patient confirming uprating of both N1 and N2 with DDG-MC as true positive. This underlines the clinical impact of our findings.
Migration of lymph node disease status, seen with PET motion correction in this study, could thus have potentially resulted in a change in clinical patient management. Uprating of lymph nodes in N2 in one case could have shifted primary treatment from surgery to definitive chemoradiotherapy. Migration of disease status of N1 in three cases could have affected further workup, as new ESMO guidelines recommend EBUS-TBNA for mediastinal lymph nodes only with additional risk factors such as cN1 [36].
Our results corroborate the findings of previous studies investigating the impact of motion correction on lesion detectability and clinical management: In a study by Sigfridsson et al., comprising 7 patients with liver metastases, DDG resulted in the detection of 41 liver lesions compared to 36 lesions with static image reconstruction [31]. In a mixed cohort of 149 patients with different tracers (i.e., FDG, PSMA and DOTATATE) and underlying pathologies included, Messerli et al. detected a higher number of metastases with DDG in organs affected by respiratory motion in up to 27% of patients included [24,31]. A higher number of lesions does not automatically result in change in clinical stage or management [36]. Nevertheless, Messerli et al. demonstrated a change in clinical management in 8% of patients in their cohort, corroborating our result that gating or motion correction can result in a change in clinical management [24]. In the only other dedicated study on n = 55 lung cancer patients, relying on 7th edition of the TNM classification, T and M staging remained unchanged when applying hardware-based respiratory gating and changes in N stage occurred in 7% or 13% depending on the reader [37]. These results are in line with the results of our study for BG-MC and DDG-MC. Besides relying on 7th edition of the TNM classification the gating approach used in the study by Grootjans et al. is significantly different from ours, since only belt-driven gated and not fully motioncorrected PET was investigated.
One of the main limitations of our study is inherent to clinical reader assessment, as readers cannot be completely blinded to the image appearance of different reconstruction images. However, by using two different methods of gating this disadvantage might be less applicable in this study than in others with only one method of gating [24]. By applying an interval of at least two weeks between reading the different datasets and by mixing different patients and reconstruction methods in one session bias is reduced. Consecutive patients were retrospectively included, and we thus had no influence on clinical stage of the patients at initial diagnosis. As previously reported gating has only a limited impact in advanced tumor stages [37]. Histopathological correlation was established for uprating N1 and N2 in one patient where dedicated EBUS-TBNA biopsy of different lymph node stations was available. In this study we included the most commonly used methods of gathering respiratory data from patients, hardware/belt-based assessment of motion and DDG and used them as a basis for full elastic motion correction; thus, a direct comparison of our results to studies using less complex gating methods alone is challenging.
To conclude, this pilot study offers first insights into the clinical impact of motion correction for F-18-FDG PET on staging scans of lung cancer patients following the 8th edition of TNM classification. Full motion correction using hardware-based and data-driven gating both seem to have a similar clinical impact on uprating in few patients with limited disease while significantly influencing quantitative PET uptake parameters.