Volume of interest delineation techniques for 18F-FDG PET-CT scans during neoadjuvant extremity soft tissue sarcoma treatment in adults: a feasibility study

Background This study explores various volume of interest (VOI) delineation techniques for fluorine-18-fluorodeoxyglucose positron emission tomography with computed tomography (18F-FDG PET-CT) scans during neoadjuvant extremity soft tissue sarcoma (ESTS) treatment. Results During neoadjuvant treatment, hyperthermic isolated limb perfusion (HILP) and preoperative external beam radiotherapy (EBRT), 11 patients underwent three 18F-FDG PET-CT scans. The first scan was made prior to the HILP, the second after the HILP but prior to the start of the EBRT, and the third prior to surgical resection. An automatically drawn VOIauto, a manually drawn VOIman, and two gradient-based semi-automatically drawn VOIs (VOIgrad and VOIgrad+) were obtained. Maximum standardized uptake value (SUVmax), SUVpeak, SUVmean, metabolically active tumor volume (MATV), and total lesion glycolysis (TLG) were calculated from each VOI. The correlation and level of agreement between VOI delineation techniques was explored. Lastly, the changes in metabolic tumor activity were related to the histopathologic response. The strongest correlation and an acceptable level of agreement was found between the VOIman and the VOIgrad+ delineation techniques. A decline (VOIman) in SUVmax, SUVpeak, SUVmean, TLG, and MATV (all p < 0.05) was found between the three scans. A > 75% decline in TLG between scan 1 and scan 3 possibly identifies histopathologic response. Conclusions The VOIgrad+ delineation technique was identified as most reliable considering reproducibility when compared with the other VOI delineation techniques during the multimodality neoadjuvant treatment of locally advanced ESTS. A significant decline in metabolic tumor activity during the treatment was found. TLG deserves further exploration as predictor for histopathologic response after multimodality ESTS treatment. Electronic supplementary material The online version of this article (10.1186/s13550-018-0397-1) contains supplementary material, which is available to authorized users.


Background
Soft tissue sarcomas (STS) are relatively rare malignancies, accounting for less than 1% of all cancers in adults. The number of patients presenting with STS each year is 600-700 in the Netherlands, leading to approximately 300 STS related deaths annually [1,2].
Roughly 50-60% of the STS arise in the extremities [3,4]. At presentation, some of these extremity soft tissue sarcomas (ESTS) are considered non-resectable or "locally advanced." Since the 1990s, neoadjuvant hyperthermic isolated limb perfusion (HILP) has been used in Europe to prevent limb amputation in these patients [5], resulting in a limb salvage rate of 80-90% in locally advanced ESTS nowadays [6][7][8][9]. HILP is used in all types of adult locally advanced ESTS. It allows to administer regional chemotherapy in high doses, as the affected limb is isolated from the systemic circulation during the procedure. Neoadjuvant systemic chemotherapy in ESTS is currently under ongoing investigation, as the data available considering patients' oncological outcome are inconsistent [10][11][12].
Fluorine-18-fluorodeoxyglucose positron emission tomography with computed tomography ( 18 F-FDG PET-CT) scans have been used to evaluate tumor changes following HILP in locally advanced ESTS since the mid-1990s [13]. Pretreatment maximum standardized uptake value (SUV max ), metabolically active tumor volume (MATV), and total lesion glycolysis (TLG) were identified as significant predictors for overall survival in STS in a recent meta-analysis [14]. Furthermore, post-treatment SUV max was shown to be promising in monitoring treatment response. However, the identification of this latter parameter was solely based on two articles included in this meta-analysis. The first only included rhabdomyosarcomas, which is a chemosensitive sarcoma, and the second only included chest wall sarcomas [14][15][16].
The SUV max of a lesion depends solely on the highest measured 18 F-FDG uptake in one voxel, thereby making the measured SUV max susceptible for noise [17]. Furthermore, the question remains whether this one measurement is representative for large, heterogeneous tumors, as STS. In contrast, the SUV max is the most robust parameter when comparing various software delineation programs, delineation methods, and observers [18]. The outcome of MATV and TLG parameters are much more dependent of the method of tumor delineation and the software program used for these analyses. We hypothesized that the use of peak standardized uptake value (SUV peak ) and mean standardized uptake value (SUV mean ) in addition to SUV max , TLG, and MATV might result in a more reliable prediction of tumor changes induced by neoadjuvant treatment.
To the best of our knowledge, the use of various VOI delineation techniques has not yet been explored in and during the neoadjuvant treatment of STS. Furthermore, in this patient population, no sequential analysis of multiple 18 F-FDG PET-CT scans has been performed previously. In this feasibility study, consecutive 18 F-FDG PET-CT scans per patient were used to investigate the use of four VOI delineation techniques because variations in VOI will directly affect the measured SUV mean , MATV, and TLG and could thus affect the performance of the PET assessments. Furthermore, we explored the changes in metabolic tumor activity (SUV max , SUV peak , SUV mean , MATV, and TLG) to neoadjuvant HILP and preoperative EBRT during the treatment course of locally advanced ESTS. Lastly, the relationship between changes in metabolic tumor activity and histopathologic response was explored.

Methods
This study has been approved by the Institutional Review Board (IRB), and the need for written informed consent was waived (IRB case number 2016.984). From 2011 to 2017, 11 patients with a median age of 64 (IQR 44-74; range 32-74) years were treated according to a novel treatment regimen consisting of neoadjuvant HILP, preoperative hypofractionated EBRT, followed by surgical resection of the tumor. All patients were diagnosed with a locally advanced, non-metastatic, high-grade ESTS (Table 1). Patients eligible for HILP treatment were included in this novel treatment regimen based on a tumor board decision. Inclusion and exclusion criteria, as well as treatment details, have been described in more detail elsewhere [19]. Patients were scheduled for three 18 F-FDG PET-CT scans. The first scan was made prior to the start of neoadjuvant treatment (baseline) and the second after the HILP, but prior to the start of the preoperative EBRT and was additionally used for EBRT delineation. The third scan was made after completion of the neoadjuvant treatment (HILP and EBRT), but prior to surgical resection. Figure 1 illustrates the change in 18 F-FDG uptake during the treatment course for one of the patients. 18

F-FDG PET-CT
The 18 F-FDG PET-CT scans were performed using a hybrid PET-CT scanner (Siemens Biograph mCT). Patients fasted at least 6 h prior to scanning, and fasting glucose levels were checked at time of injection; none of the patients suffered from diabetes mellitus. 18 F-FDG (3 MBq/kg) was injected, and the PET-CT scan was started 1 h afterwards. Patients were scanned in supine position, and images of the affected limb were acquired in 3D mode, in two to five bed positions, 1-3 min/bed position based on the patient's body weight. A preceding low dose CT scan was performed and used for attenuation and scatter correction. All images were reconstructed using an EARL compliant protocol; from 2011 to 2014, the images were reconstructed using the following reconstruction: 3i_24s, image size 400, filter Gaussian, and FWHM 5.0 mm, and from 2014 to 2017, the images were reconstructed with the following reconstruction parameters: 3i_21s, image size 256, filter Gaussian, FWHM 6.5 mm, and quality ref. mAS 30. All scans were acquired according to European Association of Nuclear Medicine guidelines (version 1.0/2.0) [20,21].

Image analyses
Scans were imported into Accurate (in-house developed analysis software, as previously used by Frings and Kramer et al. [22,23]) and recently described by Boellaard [24]. Scans were reviewed and analyzed by one researcher. To explore the effect of various delineation techniques on the measurement of the metabolic parameters, the volume of interest (VOI) of each tumor was drawn in four different ways: (1) an automatically drawn VOI auto (using 50% of the SUV peak contour, corrected for local background [22]), (2) a manually drawn VOI man (visually following tumor contours), and (3) a semi-automatic drawn VOI grad (a contour that is located at the maximum PET image intensity gradient near the boundary of the tumor). Because of tumor heterogeneity, necrotic tumor parts (mostly tumor centers) were not included in this third VOI. Therefore, a fourth VOI was derived from the VOI grad , in which all necrotic tumor parts were manually filled and included, resulting in the fourth VOI grad+ (Fig. 2). Five metabolic parameters, SUV max (voxel with the highest SUV value), SUV peak (using a 1 mL sphere), SUV mean , TLG (SUV mean × MATV), and MATV, all based on lean body mass, as recommended by Boellaard et al. [21], were derived for the four VOI delineation techniques.
Due to tumor necrosis in most tumors, either treatment-induced or due to tumor heterogeneity, only the VOI man comprised the entire tumor (including necrosis). Therefore, the VOI man was chosen as reference measurement, and the other VOI techniques were compared with the VOI man . We selected VOI man as reference VOI for pragmatic reasons (as the VOI man encompasses the entire tumor), not suggesting that this approach is best.
Correlation analyses, Bland-Altman analyses, and patient ranking were performed to compare correlation and level of agreement between the VOI delineation techniques. Bland-Altman analyses [25] and patient ranking are described in more detail in Additional file 1. Changes in metabolic tumor activity during neoadjuvant treatment were measured using the five metabolic parameters obtained from the reference VOI man and were related to histopathologic responses. Histopathologic tumor responses were established in accordance with the European Organization for Research and Treatment of Cancer-Soft Tissue and Bone Sarcoma Group (EORTC-STBSG) STS response score [19]. Grade A represents no stainable tumor cells, grade B single stainable tumor cells or small clusters (overall below 1% of the whole specimen), grade C ≥ 1 to < 10% stainable tumor cells, grade D ≥ 10 to < 50% stainable tumor cells, and grade E ≥ 50% stainable tumor cells [26].
Histopathologic responders had tumor remnants which showed < 10% stainable cells, combining response grades A, B, and C. Non-responders had ≥ 10% stainable cells in their tumor remnant, grade D or E. Lastly, the relationship between changes in metabolic tumor activity and histopathologic responses was explored.

Statistical analysis
Discrete variables were summarized with frequencies and percentages and continuous variables with medians and interquartile ranges (IQRs); none of the variables were normally distributed. Fisher's exact and Mann-Whitney U test were used to compare variables. Wilcoxon signed rank and Friedman's test were used to compare the measurements between the three scans. Correlation coefficients were calculated and tested using Spearman's test. The level of agreement between VOI techniques was determined by Bland-Altman analyses [25]. A p value < 0.05 indicated statistical significance. Microsoft Excel (2010) was used to create the Bland-Altman plots. SPSS version 23.0 (IBM SPSS Statistics for Windows, Version 23.0 Armonk, NY: IBM Corp) and GraphPad Prism version 5.04 (GraphPad Software for Windows, San Diego California USA) were used for statistical analyses.

Results
Thirty-two 18 F-FDG PET-CT scans were acquired. The third PET-CT scan of patient 10 could not be performed due to scheduling difficulties. For patient 1, in scan 3 it was not possible to draw a VOI auto , since the tumor showed an almost complete metabolic response at this treatment stage and it did not meet the margin thresholds to complete the VOI auto . Since it was possible to define the other three types of VOIs, this scan was included in the analyses and a value of zero was given to the metabolic parameters for the VOI auto . The median time between the HILP and scan 2 was 21 [18][19][20][21] days, whereas the time between the end of EBRT and scan 3 was 3 (1-3) days.

Correlation, level of agreement, and ranking of patients between VOIs
The correlation between VOIs for all scans and all metabolic parameters was strongest between the VOI man and the VOI grad+ , as indicated in gray in Table 2. The Bland-Altman plots showed an acceptable level of agreement between the VOI man and the VOI grad+ (Additional file 2: Figure S1).
No larger difference than 1 place in ranking for SUV mean , and TLG for the serial 18 F-FDG PET-CT scans was found when comparing the VOI man and the VOI grad+ delineation techniques, for the MATV no larger difference than 2 places in ranking was found. A relative large difference of 4 or more in ranking between VOI delineation techniques is indicated in gray in Additional file 3: Table S1. Among others, this was found for the MATV at scan 1 of patient 7 with considerable necrotic tumor parts. The measured MATV was found to be highest when using the VOI man, grad and grad+ techniques. However, when the VOI auto technique was used, it was only ranked a 9th place due to exclusion of tumor necrosis. Fig. 2 Differences in tumor delineation between the four VOI delineation techniques. An example illustrating the differences in tumor delineation between the four VOI delineation techniques, for patient 4 scan 2. a VOI auto . b VOI man . c VOI grad . d VOI grad+

Metabolic tumor activity
During neoadjuvant treatment, all five metabolic parameters for the reference VOI man declined between scans 1 and 3 (all p < 0.05, Fig. 3, Table 3).
This decline was further explored by calculating the absolute and the percentage difference between the three serial scans. The percentage difference was obtained by dividing the difference between scans by the measured value of the first scan. A significant decline in SUV max , SUV peak , and SUV mean was found between scan 1 vs. scan 2, as well as between scan 1 vs. scan 3. However, no significant decline in SUV max , SUVpeak , and SUV mean was found between scan 2 vs. scan 3. The decline in TLG was significant between all serial scans. A significant decline in MATV was found between scan 2 vs. scan 3. The decline in metabolic tumor activity for all parameters except MATV was largest between scan 1 vs. 2, whereas the decline in MATV was largest between scan 2 vs. 3 (Fig. 4, Table 4).
To further explore the identification of the histopathologic responders, the difference and percentage difference in TLG between scans 1 and 3 for the four VOI delineation techniques was calculated (Additional file 4: Table S2). A calculated decline in TLG of > 75% using the VOI grad/grad+ identified the same histopathologic responders as the VOI man . The VOI auto however failed to identify patient 5 as histopathologic responder. Furthermore, a > 75% decline in TLG was also found with the VOI auto and VOI grad in patients 3 and 4 and with the VOI grad+ in patient 4. Spearman's test for correlations was used to calculate significance. The strongest correlation for the three PET scans was found between the VOI man and the VOI grad+ , as indicated in gray VOI volume of interest, VOI man manually drawn VOI, VOI auto automatically drawn VOI, VOI grad VOI based on the gradient between voxels, VOI grad+ VOI grad + necrosis, 18 F-FDG PET-CT fluorine-18-fluorodeoxyglucose positron emission tomography with computed tomography, SUV max maximum standardized uptake value, SUV peak peak standardized uptake value, SUV mean mean standardized uptake value, TLG total lesion glycolysis, MATV metabolically active tumor volume, IQR interquartile range, NA not applicable

Discussion
This study studying four VOI delineation techniques in three consecutive 18 F-FDG PET-CT scans per patient demonstrates a significant decline in metabolic tumor activity (VOI man ) during the neoadjuvant treatment, consisting of HILP and preoperative EBRT, of locally advanced ESTS. The decline in SUV max , SUV peak , SUV mean , and TLG between scan 1 vs. 2 implies that the HILP accounts for the largest effect on metabolic tumor activity. The MATV seems to be affected most by the EBRT, given the significant decline found between scan 2 vs. 3.
In search of a uniform and reproducible way to calculate changes in metabolic tumor activity in these upfront highly heterogeneous tumors, the use of four different VOI delineation techniques was studied. The VOI man (defined as reference VOI) is the only delineation technique in which the entire tumor is encompassed independently of the amount of necrosis present in the tumor. Therefore, the VOI man delineation technique seems to be most reliable when used for calculating the metabolic tumor activity. However, the VOI man delineation technique is time-consuming, making it unfit for implementation into daily practice. A high correlation, acceptable level of agreement, and comparable ranking was found between the VOI man and the VOI grad+ delineation techniques. The differences in ranking between the four VOI delineation techniques are best explained by the high amount of necrosis present in these tumors, as tumor necrosis did not meet the margin thresholds of the VOI auto and VOI grad . To obtain the VOI grad+ , the necrosis was manually included, and therefore, the ranking of patients was comparable to the ranking according to the VOI man .
Thus, the VOI grad+ delineation technique seems to be a reliable and reproducible technique for the delineation of heterogeneous tumors as ESTS. Further studies including larger patient cohorts in various solid tumor types are necessary for the validation and reproducibility of the various VOI delineation techniques. This study, however, demonstrates that the applied VOI delineation technique is important to consider because we found that assessment of response based on metabolic parameters derived from different VOIs may differ across subjects.
The metabolic tumor changes during neoadjuvant treatment between scan 1 vs. scan 3 were analyzed and compared with the corresponding histopathologic tumor response. Out of the five metabolic parameters tested, TLG seemed to identify the histopathologic responders most reliably (> 75% decrease in TLG between scan 1 and scan 3) when using the VOI man delineation technique. Using the 75% decrease in TLG as a cutoff value was derived empirically from the data, used as example, and to obtain pilot data for using and comparing these techniques. When compared with the VOI man delineation technique, the VOI grad+ technique identified the same histopathologic responders with only one additional patient. It seems that these two delineation techniques most reliably identify histopathologic responders, because they include tumor necrosis. The difference in performance of the VOI man and VOI grad+ delineation techniques in identifying histopathologic responders is very subtle. However, the VOI grad+ delineation technique was found to be easier in use and is considerably less time-consuming than the VOI man technique, making it more suitable for implementation into daily practice. The VOI delineation techniques and the TLG cutoff value need confirmation in larger patient cohorts.
During the last years, the predictive value of 18 F-FDG PET-CT scans in staging and monitoring treatment response during neoadjuvant treatment has been established for various solid tumors (including metastatic colorectal cancer and non-small cell lung cancer [23,[27][28][29]. Therefore, further ESTS studies in which metabolic tumor activity, e.g., > 75% decrease in TLG with VOI man and/ or VOI grad+ , is explored as predictor for monitoring therapy response, for histopathologic findings, and for oncological outcome are warranted. The identification of reproducible and reliable VOI delineation techniques, as well as the identification of robust PET parameters for the interpretation of changes in metabolic tumor activity, is relevant because this will enable clinicians to shorten delineation time and to compare Data presented as median (IQR) VOI volume of interest, VOI man manually drawn VOI, 18 F-FDG PET-CT fluorine-18-fluorodeoxyglucose positron emission tomography with computed tomography, SUV max maximum standardized uptake value, SUV peak peak standardized uptake value, SUV mean mean standardized uptake value, TLG total lesion glycolysis, MATV metabolically active tumor volume, IQR interquartile range results between observers, patients, and centers for ESTS and for other solid tumor types. This study has some limitations, such as the retrospective character and the small patient population of the study. Only 11 patients were included; however, all patients but one underwent all three 18 F-FDG PET-CT scans, and therefore, it was possible to establish the changes in metabolic tumor activity during the neoadjuvant treatment in all patients. Possibly, the interpretation of the third PET scan is biased by local inflammatory changes following the EBRT. These inflammatory changes might partly explain the significantly more pronounced decrease in metabolic tumor activity following the HILP then following the EBRT, as found in the current series. Despite this potential bias due to radiation-induced local inflammatory changes, a decrease in metabolic tumor activity between scans 1 and 3 was found, which theoretically might have been larger without these changes. For the purpose of this study, all data considering the metabolic tumor activity were obtained from an additional analyses of the 18 F-FDG PET-CT scans, since these data are not used in routine patient care. Interestingly, the EORTC-STBSG response score [26] could be used to explore the relationship between changes in metabolic tumor activity and histopathologic response. However, the prognostic value of the STS response score according to the proportion of stainable tumor cells needs further validation [30].

Conclusions
This study identified the VOI grad+ delineation technique as most reliable considering reproducibility when compared with the other delineation techniques during the multimodality neoadjuvant treatment of locally advanced ESTS. Moreover, the VOI grad+ delineation technique was considerably less time-consuming to perform when compared to the VOI man technique, potentially resulting in easier implementation in clinical practice. A significant decline in metabolic tumor activity during the treatment was found. The decrease in metabolic tumor  Data presented as median (IQR) VOI volume of interest, VOI man manually drawn VOI, 18 F-FDG PET-CT fluorine-18-fluorodeoxyglucose positron emission tomography with computed tomography, SUV max maximum standardized uptake value, SUV peak peak standardized uptake value, SUV mean mean standardized uptake value, TLG total lesion glycolysis, MATV metabolically active tumor volume, IQR interquartile range. *p < 0.05; # p < 0.01 F-FDG PET-CT fluorine-18-fluorodeoxyglucose positron emission tomography with computed tomography, SUV max maximum standardized uptake value, SUV peak peak standardized uptake value, SUV mean mean standardized uptake value, TLG total lesion glycolysis, MATV metabolically active tumor volume, EORTC-STBSG European Organization for Research and Treatment of Cancer-Soft Tissue and Bone Sarcoma Group