An investigation of the relation between tumor-to-liver ratio (TLR) and tumor-to-blood standard uptake ratio (SUR) in oncological FDG PET

Background The standardized uptake value (SUV) is the nearly exclusive means for quantitative evaluation of clinical [18F-]fluorodeoxyglucose (18F-FDG) positron emission tomography (PET) whole body investigations. However, the SUV methodology has well-known shortcomings. In this context, it has been recognized that at least part of the problems can be eliminated if tumor SUV is normalized to the SUV of a reference region in the liver (tumor-to-liver [TLR] ratio). In recent publications, we have systematically investigated the tumor-to-blood SUV ratio (SUR) for normalization of tumor SUVs which in our view offers principal advantages in comparison to TLR. The aim of this study was a comprehensive comparison of TLR and SUR in terms of quantification of tumor lesions. Methods 18F-FDG PET/CT was performed in 424 patients (557 scans) with different tumor entities prior to radio(chemo)therapy. In the PET images, SUVmax of the primary tumor was determined. SUVliver was calculated in the inferior right lobe of the liver. SUVblood was determined by manually delineating the aorta in the low-dose CT. TLR and SUR were computed and scan time corrected to 60 min p.i. (TLRtc and SURtc). Correlation analysis was performed for SUVliver vs. SUVblood, TLR vs. SUR, SUVliver/SUVblood vs. SUVblood,SURtc/TLR vs. SURtc, and SURtc/TLRtc vs. SURtc. Variability of the respective ratios was assessed via histogram analysis. The prognostic value of TLR and TLRtc for distant metastases-free survival (DM) was investigated with univariate Cox regression in a homogeneous subgroup (N = 130) and compared to previously published results for SUV and SURtc. Results Correlation analysis revealed a linear correlation of SUVliver vs. SUVblood (R 2=0.83) and of TLR vs. SURtc (R2=0.92). The SUVliver/SUVblood ratio (mean ± s.d.) was 1.47 ± 0.18. For the SURtc/TLR ratio, we obtained 1.14 ± 0.21 and for the SURtc/TLRtc ratio 1.38 ± 0.17. Survival analysis revealed TLR and TLRtc as significant prognostic factors for DM (hazard ratio [HR] = 3.3 and HR = 3, respectively). Both hazard ratios are lower than that of SURtc (HR = 4.1) although this reduction does not reach statistical significance for the given limited group size. HRs of TLR and SURtc are both significantly higher than HR of SUV (HR = 2.2). Conclusions Suitability of the liver as surrogate of arterial tracer supply for SUV normalization via TLR computation is limited. Further studies in sufficiently large patient groups are required to better characterize the relative performance of SUV, TLR, and SUR in different settings.


Background
The standardized uptake value (SUV) currently is the nearly exclusive means for quantitative evaluation of clinical [18F]fluorodeoxyglucose (18F-FDG) positron emission tomography (PET) whole-body investigations. However, the SUV methodology has well-known shortcomings such as uptake time dependence of the SUV, unsatisfactory test/retest stability, susceptibility to errors in scanner calibration etc. [1][2][3][4][5][6] all of which adversely affect the reliability of the SUV as a surrogate of the metabolic rate of FDG (and ultimately of glucose consumption).
In this context, it has been recognized repeatedly that at least part of the mentioned problems can be reduced or eliminated if tumor SUV is normalized to the SUV of a suitable reference region [7]. Especially, the liver has drawn considerable attention as a useful reference region since the liver does not irreversibly trap the FDG and maintains a roughly constant SUV level during the time window relevant for whole-body FDG PET (about 60-120 min p.i.) [8][9][10][11][12][13]. In fact, the liver is the only reference region which so far has been studied and used extensively.
Using the tumor-to-liver-ratio (TLR) obviously removes some of the SUV limitations, i.e. possible inaccuracies regarding actually injected dose, scanner calibration, and patient weight index (either actual body weight, lean body mass [14], or body surface area [15]).
However, TLR exhibits an uptake time dependence comparable to that of tissue SUV itself (without a generally accepted means of quantitatively correcting for this effect in either case). Possibly more important, liver SUV (SUV liver ) will exhibit an inter-individually (and possibly also intra-individually in case of aggressive treatment such as chemotherapy) variable relation to the given arterial tracer supply. On the other hand, it is the latterexpressed in SUV units (SUV blood )-which determines a given lesion's observed SUV. Usefulness of the liver as a reference might be further compromised in the presence of liver disease or pharmacological intervention [16,17]. Last but not least, depending on the investigation, the liver might simply not be routinely included in the field of view of the PET scan (e.g. at the participating sites in head and neck investigation, the liver is not always included in the FOV while a sufficiently large part of the aorta still is). For all these reasons, the liver cannot be considered an ideal reference region.
In recent publications, we have systematically investigated the tumor-to-blood SUV ratio (SUR) for normalization of tissue SUVs which in our view offers principal advantages in comparison to TLR. For one, the SUR approach by definition eliminates the influence of the persisting residual variability of SUV blood on lesion SUV and ensures that SUR is superior to lesion SUV itself as a surrogate parameter of the metabolic rate of FDG [18].
Additionally, we were able to show that it is possible to reliably correct SUR for variations of the 18F-FDG uptake period under rather general and empirical well-fulfilled assumptions regarding the shape of the arterial input function (AIF) [19]. These advantageous properties of the SUR can be ultimately traced back to the empirical fact, that the AIF after FDG bolus injection exhibits an essentially invariant shape, following a simple inverse power law starting immediately after the bolus phase. Finally, we found strong evidence in a survival analysis of 130 patients with esophageal carcinoma that the superior properties of SUR also translate into a higher prognostic value [20]. While there is thus rather strong theoretical and empirical evidence for the superiority of SUR over SUV, it is so far an open question how performance of SUR compares to that of TLR.
The primary aim of the present investigation, therefore, was accurate determination of the degree of correlation between TLR and SUR. A secondary goal was to perform a first direct comparison of the performance of TLR and SUR as predictor of therapy outcome. For this purpose, we have utilized the patient group previously investigated in [20].

Patient group
In this retrospective study, 424 patients (358 men, 66 women) with mean age (range) 63 (37-85) years and different tumor entities (head and neck cancer N = 36 (HNC), non small cell lung cancer N = 178 (NSCLC), esophageal carcinoma N = 210 (EC)) were included. This patient group incorporates 130 patients with esophageal carcinoma treated with definitive radio(chemo)therapy previously investigated in the already mentioned study by Bütof et al. [20]. This subgroup is utilized in the present study for comparison of the prognostic value of TLR and SUR. In 84 out of 424 patients, two PET scans were performed at different days, where the first scan was before radio(chemo)therapy and the second scan afterwards. Time between first and second scan was on average 39.1 days (range 10-76). These data were included to study the intra-subject variability. In 49 out of 424 patients, dual time-point measurements were performed, and the respective late scans were included to extend the range of covered uptake times (up to 120 min). Altogether, 557 18F-FDG PET/CT scans were performed at University Hospital, Technische Universität Dresden (Site A) and at the University Hospital, Otto-von-Guericke University Magdeburg (Site B). Only scans where the liver as well as the aorta was in the FOV were included. Scan characteristics are summarized in Table 1. All scans besides the above mentioned were performed before radio(chemo)therapy and/or surgery. All patients had fasted for at least 6 h prior to 18F-FDG injection. The serum glucose concentration

Image analysis
ROI definition and ROI analyses were performed using the ROVER software, version 2.1.20 (ABX, Radeberg, Germany). Here and in the following, "ROI" is used synonymously with "VOI" for denoting a three-dimensional volume of interest. The metabolically active part of the primary tumor was delineated by an automatic algorithm based on adaptive thresholding taking the local background into account [21]. The result of the automatic delineation was inspected visually by an experienced observer (one observer at each site) and corrected manually in case of obvious segmentation failure. For the resulting ROIs, SUV max was computed. In the following, the index "max" is omitted, since only the maximum of lesion SUV and derived quantities (TLR, SUR) was considered in the evaluation.
The arterial blood SUV was determined by defining a roughly cylindrical aorta ROI in the attenuation CT data which than was transferred to the PET data. To exclude partial volume effects, a concentric safety margin was used in the transaxial planes, centering the ROI in the aorta. Planes showing high tracer uptake close to the aorta (pathological or otherwise) were excluded. The aorta ROI was positioned in the descending aorta, and the minimum volume was 5 ml. For the determination of the SUV liver , a spherical 3D ROI with a diameter of approximately 3 cm (14 ml) was placed on the normal inferior right lobe of the liver. TLR (SUR) was computed as ratio of maximum lesion SUV and mean SUV of the liver (aorta) ROI. In the following, we omit the index "mean" for liver (aorta) SUV. Scan time corrected SUR values were computed as described in [19]: where T is the actual scan time p.i. and T 0 is the chosen standard scan time to which the SURs are normalized (60 min in the present work). V r = 0.53 ml/ml is an estimate of the apparent volume of distribution, corresponding to the y-axis intercept of a Patlak plot, previously derived in dynamic investigations [22]. Note, that for not too small SUR values, the influence of V r is small and might be neglected, simplifying the correction formula to SUR tc = T 0 T × SUR. As our previous work [20] demonstrates, the scan-time correction distinctly improves the prognostic value of the SUR, and it is thus the scan-time corrected value SUR tc which should be compared against TLR. Of course, TLR is scan-time dependent as well but usually no attempt is made to correct for this effect, so the primarily relevant comparison is that between SUR tc and this (scan time uncorrected) TLR. But, for completeness sake, we also compared SUR tc with a scan-time-corrected TLR as follows. For scan time correction of TLR, we note that the SUV liver is nearly time-independent in the relevant time window (≈ 60 − 120 min p.i.) so that the fractional change of TLR over time is essentially identical to the corresponding change of lesion SUV. In [19], we have demonstrated that scan time correction of lesion SUV is possiblealthough somewhat less accurate than for SUR-but in principle requires knowledge of SUV blood . However, an approximate correction is possible without this knowledge. When using the TLR approach instead of SUR (i.e. in absence of SUV blood determination) this approximation would be the only feasible approach which we have thus used in the present investigation. The resulting correction formula is where b = 0.313 is a parameter describing the shape and decrease of the arterial input function over time (see [19] for details).

Statistical analysis
Inter-subject variability of SUV blood and SUV liver was analyzed in the whole patient group where for patients with two PET scans only the first scan was used (N = 424).
Inter-subject variability was assessed as standard deviation (SD) of the distribution of the respective SUV. Intrasubject variability of SUV blood and SUV liver was analyzed in the subgroup of 84 patients that received two scans on separate days. It was assessed as SD of the distribution of SUV (= paired difference of the respective SUV in the second and first scan). Inter-and intra-subject variabilities were compared using a two-sided F test of the corresponding variances (squared SDs) testing the null hypothesis that they are equal.
Linear correlation analysis of liver vs. blood SUV and of TLR vs. SUR tc (N = 557), respectively, was performed and visualized through scatterplots. Linear correlation analysis was also performed for LBR vs. SUV blood as well as for SUR tc /TLR and SUR tc /TLR tc , respectively, vs. SUR tc . Variability of the respective ratios was assessed via histogram analysis and quantified by mean ± SD and 90 % confidence interval (CI).
Survival analysis was performed in the patient group already analyzed in [20] where the prognostic value of several PET parameters and of clinically relevant parameters for overall survival, locoregional tumor control, and distant metastases-free survival (DM) was investigated. In the present study, we investigate the prognostic value of TLR and TLR tc for DM (for which the largest effect size was found in our previous study) using univariate Cox regression. For comparison, we also show the already published results for SUV and SUR tc . Hazard ratios were compared using the bootstrap method (random re-sampling with replacement; 10 5 samples) to determine the statistical distribution of (HR 1 − HR 2 ) from which the relevant P value than was derived. Statistical significance was assumed if P < 0.05. Statistical analysis was performed with the R language and environment for statistical computing [23] version 3.1.2.

Compliance with ethical standards
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. Informed consent was obtained from all individual participants included in the study.
The mean values of SUV blood and SUV liver across all 424 investigated patients were 1.79 ± 0.36 and 2.56 ± 0.55, respectively. The mean intra-individual paired differences, SUV blood and SUV liver , in 84 patients receiving two PET scans on different days were 0.05 ± 0.32 and 0.24 ± 0.42, respectively. This demonstrates that the interand intra-subject variability (i.e. the respective standard deviations) of both SUVs are of very similar magnitude (although the small positive difference between the interand intra-subject SUV liver variability actually reaches statistical significance [P = 0.003]).
Correlation analysis revealed a pronounced linear correlation of SUV liver and SUV blood (R 2 = 0.83) and of TLR and SUR tc (R 2 = 0.92). Corresponding scatterplots are shown in Fig. 1. There were no notable differences between investigating sites, tumor entities, or tumor size ( Table 2). For LBR, we obtained 1.47 ± 0.18 (90 % CI 1.2-1.78). Corresponding scatterplot and histogram are shown in Fig. 2. For the SUR tc /TLR ratio, we obtained 1.14 ± 0.21 (90 % CI 0.82-1.48) and for the SUR tc /TLR tc ratio 1.38 ± 0.17 (90 % CI 1.12-1.65). Corresponding scatterplots and histograms are shown in Fig. 3. Obviously, time correction of TLR reduces the fractional variability of the ratio (from about 18 to 12 %). For all ratios, there was no notable difference between investigating sites, tumor entities, or tumor size (Table 3).
Survival analysis (N = 130) revealed TLR and TLR tc as significant prognostic factors for DM without being significantly different from each other (HR = 3.3 and HR = 3, respectively). These hazard ratios are to be compared with the previously reported results from this patient group [20] for SUV (HR = 2.2) and SUR tc (HR = 4.1). Further details can be found in Table 4. Corresponding Kaplan-Meier curves are shown in Fig. 4.
According to bootstrap resampling, HRs of TLR and SUR tc were both significantly higher than the HR of SUV (P = 0.019 and P = 0.048, respectively) while the HR difference between TLR tc and SUV was not significant (P = 0.17). The HR difference between SUR and TLR or TLR tc was also not significant (P = 0.31 or P = 0.16). Figure 1a demonstrates a pronounced but far from perfect linear correlation between SUV liver and SUV blood . Indeed, a stronger correlation of both quantities might be expected since in any single investigation, tracer uptake at a given time point in any given target region (the liver included) is proportional to the overall scale of the AIF and, consequently, to its value at the chosen time point. Thus, in view of the fact that the AIF exhibits an essentially invariant shape across different investigations [18,19] and presuming the metabolic state of the liver could be considered sufficiently similar with respect to uptake and release of FDG across different investigations/patients, a near-perfect linear correlation (actually, a proportionality) would result in Fig. 1a, at least for sufficiently standardized uptake time. However, this is not the case.

Discussion
Considering the possible explanations, it is easily verified that the deviations from a perfect straight line are not a consequence of statistical errors due to the given signal to noise ratio of the corresponding ROI averages [24]. Systematic errors due to regionally variable accuracy of attenuation or scatter correction, too, would not be able to disturb the linear correlation to such an extent. a b Fig. 1 a Correlation between SUV liver and SUV blood . b Correlation between TLR and SUR tc . Black lines represent the least squares straight line fits to the data. Red lines depict the 95 % CI Excluding measurement-related effects, two obvious possible explanations remain for the sizable deviations from perfect linear correlation. First, the correlation might be adversely affected by differences in uptake time (colorcoded in the scatter plots) since the time activity curves in liver and blood have different shapes and the LBR, thus, is time-dependent (slowly increasing over time). It is obvious from the color-coding of the data points according to uptake time in Figs. 1a and 2a that this effect at most is responsible for a minor part of the scatter, driving LBR to somewhat higher values at late times (which on average correspond to lower SUV blood values, explaining the small but significant negative correlation of LBR and SUV blood in Fig. 2a).
The only remaining plausible explanation in our view is to attribute the scatter to non-negligible inter-and intraindividual quantitative differences of FDG kinetics in the liver between different patients or scans. Regarding the degree of intra-subject variability of SUV liver and SUV blood separately, our results are in good quantitative agreement with [4]. Our data furthermore demonstrate that inter-subject variability of both quantities is very similar to the respective intra-subject variability (although the difference reaches statistical significance in case of the liver where inter-subject variability is slightly larger than the intra-subject variability). We believe this to be an important observation in itself; intra-and interindividual fluctuations of SUV liver and SUV blood do have very similar magnitude.
While our data thus essentially confirm and augment existing data regarding inter-scan variability of SUV liver and SUV blood they, furthermore, provide to our knowledge the first comprehensive investigation of the degree of correlation between both quantities. Regarding utilization of the liver as reference region for lesion SUV normalization, our data demonstrate that the liver in fact cannot be considered a highly accurate substitute for actual arterial tracer supply; from the data shown in Fig. 2, we derive an LBR of 1.47 ± 0.18 with a 90 % confidence interval of 1.2-1.78 whose limits differ by 48 %. These fluctuations directly translate into spurious fluctuations of the derived TLR values which would erroneously be interpreted as being due to changes in lesion metabolism.
The magnitude of this effect is demonstrated in Fig. 1b where TLR is compared to SUR tc . While the correlation coefficient is larger than that in Fig. 1a (ultimately a consequence of the much higher dynamic range of SUR and TLR in comparison to SUV blood and SUV liver ), the SUR tc /TLR ratio in fact exhibits a fractional variability that is distinctly higher than that of LBR (about 18 vs. 12 %) which also is apparent from a comparison of Fig. 2 and Fig. 3a, b. This increased variability is caused by the fact that we use SUR tc here, rather than the scantime uncorrected SUR for the reasons explained in the introduction. Since uptake time correction of TLR is currently not applied in clinical routine, one thus actually faces a variability of TLR in comparison to SUR tc of 1.14 ± 0.21 (90 % CI 0.82-1.48) if actual scan times are as variable as in our study group. Also performing uptake time correction for TLR approximates the situation where uptake times would be strictly standardized (to 60 min in the present case). This leads to the results shown in Fig. 3   ratio in comparison to the LBR data in Fig. 2. This should be expected if the scan-time correction performs well since the time dependence of LBR itself is rather weak as already discussed above. The bottom line here is that the SUR tc /TLR ratio exhibits variability which is at least as large as that of LBR but will be substantially higher under typical clinical conditions where uptake times can vary considerably [25,26]. Accepting our point of view that SUR tc for principal reasons should be considered to represent the best available surrogate of lesion glucose consumption (since it uses the "correct" way of normalizing directly to the actual arterial tracer supply and accounts for time dependence of both; lesion uptake and AIF) the stated variability of SUR tc /TLR represents a principal limitation of the TLR as a surrogate of lesion glycolysis.
Of course, even if this conjecture is correct, the real question is how TLR performs in comparison to SUV and SUR regarding its prognostic value. In comparison to SUV, it has been repeatedly shown [12,13,27] that TLR is capable of improving the prognostic value of the PET investigation. It thus is unquestionably a valuable concept. On the other hand, the much more recently proposed SUR has not yet seen wide-spread evaluation and a comparison to TLR has been completely missing so far. We therefore Results for SUV and SUR tc have been taken from our paper [20] consider the results presented in Fig. 4 and Table 4 of special interest. They clearly demonstrate that TLR as well as SUR tc are superior to SUV as predictors of DM in the investigated patient group. Uptake time correction of TLR (TLR tc ) did not improve the prognostic value as described by the hazard ratios in Table 4 in comparison to TLR. This was an initially somewhat unexpected result since uptake time correction reduces the deviations from a constant SUR tc /TLR tc ratio. This finding might indicate that in our patient group the improved prognostic value of SUR tc is caused mainly by the beneficial influence of normalization to SUV blood rather than by scan-time correction. However, further investigations will be necessary to clarify this question.
The already previously reported that HR of SUR tc is distinctly higher than HR of TLR (HR[SUR tc ] = 4.1, HR[TLR] = 3.3). In the given study group with its limited group size, though, the increase of HR is not large enough to reach statistical significance in the bootstrap resampling analysis. This indicates that the principal advantages of SUR tc over TLR (consideration of actual arterial tracer supply and accurate uptake time correction) are not decisive at the given level of statistical accuracy available in our study group. Nevertheless, we believe that the observed very weak indication of superiority of SUR tc over TLR is a sufficient incentive to further investigate the relative performance of TLR and SUR in other patient groups. Personally, we believe it very likely that ultimate superiority of SUR over TLR will be demonstrated since the latter parameter does not allow to fully account for the inter-and intra-individual variability of arterial tracer supply (and thus remains subject to spurious changes which are unrelated to differences in lesion glycolysis). In any case, both parameters are clearly superior to SUV and in practical terms might be viewed to some extent as complementary concepts (rather than competing ones) since the a b c d Fig. 4 Kaplan-Meier curves with respect to DM (N = 130 patients with esophageal carcinoma). Results for SUV and SUR tc have been taken from our paper [20] blood pool (aorta) will frequently be covered in the FOV even when the liver is not (or when the presence of liver disease precludes use of the TLR approach). Overall, it seems worthwhile and promising to further investigate the relative performance of SUV, TLR, and SUR in other patient groups with the ultimate goal of deciding whether SUR can be considered as generally superior to SUV and TLR. If this turns out to be true, it would constitute a strong incentive to use SUR as a drop-in replacement for the current SUV and TLR methodology (or at least as an attractive alternative to the latter one) in clinical whole body FDG PET.

Conclusions
Suitability of the liver as a surrogate of arterial tracer supply for SUV normalization via TLR computation is limited due to the less-than-perfect correlation between blood and liver SUV, and the SUR approach remains attractive for principal as well as practical reasons. Regarding their respective prognostic value, both, TLR and SUR significantly outperformed SUV. Further studies in sufficiently large patient groups are required to better characterize the relative performance of SUV, TLR, and SUR in different settings.