Development and validation of a prognostic model incorporating [18F]FDG PET/CT radiomics for patients with minor salivary gland carcinoma

Objectives The aim of this study was to develop and validate a prognostic model incorporating [18F]FDG PET/CT radiomics for patients of minor salivary gland carcinoma (MSGC). Methods We retrospectively reviewed the pretreatment [18F]FDG PET/CT images of 75 MSGC patients treated with curative intent. Using a 1.5:1 ratio, the patients were randomly divided into a training and validation group. The main outcome measurements were overall survival (OS) and relapse-free survival (RFS). All of the patients were followed up for at least 30 months or until death. Following segmentation of tumors and lymph nodes on PET images, radiomic features were extracted. The prognostic significance of PET radiomics and clinical parameters in the training group was examined using receiver operating characteristic curve analysis. Variables showing a significant impact on OS and RFS were entered into multivariable Cox regression models. Recursive partitioning analysis was subsequently implemented to devise a prognostic index, whose performance was examined in the validation group. Finally, the performance of the index was compared with clinical variables in the entire cohort and nomograms for surgically treated cases. Results The training and validation groups consisted of 45 and 30 patients, respectively. The median follow-up time in the entire cohort was 59.5 months. Eighteen relapse, 19 dead, and thirteen relapse, eight dead events were found in the training and validation cohorts, respectively. In the training group, two factors were identified as independently associated with poor OS, i.e., (1) tumors with both high maximum standardized uptake value (SUVmax) and discretized intensity entropy and (2) poor performance status or N2c-N3 stage. A prognostic model based on the above factors was devised and showed significant higher concordance index (C-index) for OS than those of AJCC stage and high-risk histology (C-index: 0.83 vs. 0.65, P = 0.005; 0.83 vs. 0.54, P < 0.001, respectively). This index also demonstrated superior performance than nomogram for OS (C-index: 0.88 vs. 0.70, P = 0.017) and that for RFS (C-index: 0.87 vs. 0.72, P = 0.004). Conclusions We devised a novel prognostic model that incorporates [18F]FDG PET/CT radiomics and may help refine outcome prediction in patients with MSGC.


Introduction
Minor salivary gland carcinoma (MSGC)-with an annual incidence of 0.16-0.4 new cases per 100,000 population [1]-is a rare malignancy that accounts for only 0.3-1.8% of all head and neck tumors [2,3]. The clinical characteristics and histology of MSGC can vary greatly, and the disease course is still largely unpredictable [4,5]. Several efforts have been made to improve the prognostic stratification of patients with MSGC [4,[6][7][8][9][10][11]. The American Joint Committee on Cancer (AJCC) tumor stage and the histology risk group according to the 2005 World Health Organization (WHO) classification system [12] are currently considered as the main prognostic factors for both patient survival and disease relapse [11]. Ali S et al. had proposed postoperative nomograms for prediction of major salivary gland malignancy. Five variables were used for predictive for OS: age, clinical T4 stage, histological grade, perineural invasion, and tumor dimension [13]. Lu CH et al. also demonstrated a nomogram including clinicopathologic variables of smoking, tumor grade, perineural invasion, lymphatic invasion, and pathologic T-and Nstage to predict the recurrent probability of major gland cancer [14]. However, surgical and pathological results were required for those nomograms and their utility in MSGC remains unclear.
Growing evidence indicates that [ 18 F]FDG parameters can predict outcomes in salivary gland cancer (SGC) [15][16][17][18]. We had found in high-risk histology SGC, including 49 MSGC, maximum standardized uptake value (SUV max ) and discretized intensity entropy (an index of image heterogeneity) having prognostic value [17]. However, their roles in MSGC needed further investigation. In this extended study, we added 26 new cases of MSGC including all kinds of histology, ranging from low to high risk. Owing to major differences in image acquisition, analysis protocols, and different resolution of varied PET systems, PET radiomics studies should meet rigorous methodological and reporting criteria [19][20][21]-which unfortunately remains a frequently overlooked issue in the available literature (especially in terms of validation and reproducibility).
Starting from these premises, we designed the extended study to (1) develop and validate a prognostic model incorporating [ 18 F]FDG PET/CT radiomics for patients with MSGC, (2) compare this model with clinical variables, including staging systems, WHO risk histology types, and previously published nomograms.

Study patients
The inclusion criteria of this study were (1) MSGC patients who complete curative intent therapy between January 2007 and December 2016 at our institutes. (2) All patients must have staging [ 18 F]FDG PET/CT before treatment and must follow up of at least 30 months after diagnosis or until death. The exclusion criteria were (1) patients who presented with distant metastases at diagnosis (M1), and (2) patients had a positive history of other malignancies. Demographic and survival data were collected from all participants. Disease staging was performed according to the AJCC Staging Manual, seventh edition [22], based on the results of PET/CT and MRI (or contrast-enhanced CT). In the presence of discrepant findings between the two imaging modalities, lesions were subjected to biopsy. PET images were independently interpreted by two expert nuclear medicine physicians (N.M.C. and T.C.Y.). In accordance with the 2005 WHO classification, the following tumors were considered as high-risk histology types of MSGC: adenoid cystic carcinoma (ACC), mucinous adenocarcinoma, squamous cell carcinoma, small or large cell carcinoma, lymphoepithelial carcinoma, metastasizing pleomorphic adenoma, and high-grade (HG) carcinomas [i.e., HG mucoepidermoid carcinoma (MEC), HG salivary duct carcinoma, HG carcinoma ex-pleomorphic adenoma, and HG cystadenocarcinoma]. Performance status (PS) was assessed using the Eastern Cooperative Oncology Group (ECOG) criteria. Radical surgery was the mainstay of treatment, whereas postoperative radiotherapy (RT) or chemoradiotherapy (CCRT) was performed in selected cases [16]. Patients were treated with definitive RT or CCRT in the presence of at least one of the following conditions: (1) the presence of a non-resectable malignancy; 2) medical intolerance to anesthesia, or 3) unwillingness to undergo surgery. Follow-up imaging consisted of [ 18 F]FDG PET/CT and CT or MRI scans performed every 3 − 6 months in the first 2 years, every 6 − 12 months between the third and the fifth year, and every 12 − 24 months thereafter. Using a 1.5:1 ratio, the study sample was randomly divided into a training (n = 45) and validation (n = 30) group. Randomization was based on the ranking of hospital identification numbers, which were randomly assigned to each patient during their first hospital visit. The optimal cutoff values for PET parameters were determined in the training group and subsequently tested in the validation group. The study followed the tenets of the Helsinki Declaration and was approved by the Institutional Review Board of the Chang Gung Memorial Hospital. Owing to the retrospective nature of the study, the need for informed consent was waived. respectively. An ordered-subset expectation maximization iterative reconstruction algorithm (4 iterations and 10 subsets for the Discovery ST16, 2 iterations, and 21 subsets for the Biograph mCT) based on CT-based attenuation map were applied for PET image reconstruction. Time-on-flight (TOF) technique to improve image quality was used in the Biograph mCT scanner. The values of axial spatial resolution for the Discovery ST16 and Biograph mCT scanners were 4.80 mm and 2.16 mm, respectively.

[ 18 F]FDG PET radiomics
A fixed 40% threshold (T40) of SUV max was applied for the segmentation of tumors and lymph nodes. This approach has been previously utilized for SGC by other research groups [15] and repeatedly applied in recent studies [23,24]. Image features were extracted in the volume of interest (VOI) using intensity histogram, gray level co-occurrence matrix (GLCM), gray-level run-length matrix (GLRLM), and gray-level size zone matrix (GLSZM). Intensity histogram is generated using the three-dimensional (3D) tumor volume by discretizing the original intensity into intensity bins. GLCM was used to assess the relationship between two neighboring voxels within the original image in an orientation invariant manner by averaging 13 direction vectors within the neighborhood of one Chebyshev distance. GLRLM determines the size of the uniform run (length) for each gray level. Similar to GLCM, we used the 3D matrix and the values of each texture index are averaged over the 13 directions. GLSZM counts the number of linked voxels which defined as two or more of the neighboring voxels have the same gray level within one Chebyshev distance. To reduce noise stemming from image processing, we applied a fixed bin number method [25] with 16 bins according to our previous paper [17].
SUV max and metabolically active tumor volume (MATV) have prognostic significance in SGC [15,16] and were therefore included in the analysis. According to PET studies based on double baseline images [25,26], the following parameters are characterized by high reproducibility and repeatability: asphericity (from shape analysis); discretized intensity entropy (from intensity histogram); angular second moment (ASM) and sum entropy (from GLCM); run-length nonuniformity (RLNU) and high gray level run emphasis (HGLRE) (from GLRLM); zone-size nonuniformity (ZSNU), and high gray level zone emphasis (HGLZE) (from GLSZM). The discretized intensity entropy represents the sum of fixed bins probabilities multiplied by the natural logarithm of the probability values. GLCM features rely on the probability distribution for the elements of the GLCM. ASM measures the feature of textural uniformity of an image and sum entropy quantifies the randomness of intensity distribution. ASM and sum entropy often inversely correlated with each other. RLNU quantifies the distribution of runs over the run lengths, and low RLNU is noted in an image with equally distributed runs along run lengths. HGLRE weights the runs with high gray level voxel intensity. HGLRE is high in an image with greater runs of high intensity. ZSNU measures the distribution of zone counts over the different zone sizes and is expected to be small if the zones are similar throughout the entire image. HGLZE weights the zones with high gray level voxel intensity. HGLZE is high in an image with numerous zones of high intensity.
Because nodal lesions were generally small-sized, we did not apply texture analyses to lymph nodes and only nodal SUV max , MATV, and asphericity were taken into account. For patients who had multiple lymph nodes metastases, lymph node MATV was the sum of the volume of all involved nodes while lymph node SUV max and asphericity were determined using the highest value among lesions.
The Chang-Gung Image Texture Analysis toolbox (CGITA) was used for calculation PET parameters. The terms and equations of PET texture parameters and the calculation process are consistent with the Imaging Biomarker Standardization Initiative (IBSI) framework [21,27]. The compliance of IBSI framework was evaluated using the IBSI digital phantom, and the resulting data were shown in the Supplementary file.

Statistical analysis
Overall survival (OS) and relapse-free survival (RFS) served as the main outcomes of interest. OS was the time elapsed from the histological diagnosis to the date of death from any cause (or censored on the date of the last followup). RFS was defined as the time from the date of primary treatment to the first disease recurrence (or censored on the date of the last follow-up). For patients who did not achieve a complete response after therapy, the date of recurrence was set at the end of primary therapy [28]. Correlations among the study variables were tested using Spearman's correlation coefficient (ρ). Categorical variables were analyzed using the chi-square test. The difference of PET parameters between training/validation cohorts and scanners was accessed using Mann-Whitney U test. To account for a potential selection bias caused by lack of randomization, propensity scores for OS in the training, and validation groups were calculated for different confounding factors. Data analyses were conducted using the SPSS statistical package, version 21 (IBM, Armonk, NY, USA). Statistical testing was two-sided, and Bonferroni's correction was applied to adjust for multiple comparisons.
The selection of PET parameters for survival analyses was based on three steps. Firstly, the impacts of PET radiomics on OS were assessed in the training group using receiver operating characteristic (ROC) curve analysis. All variables that produced an area under the ROC curve (C-index) significantly different from 0.5 were selected and their optimal cutoff values for OS were determined by method of maximizing Youden index using MedCalc version 19.1 (Mariakerke, Belgium). We set P values below 0.10 as statistically significant for the selection process. That was a P value of less than 0.01 (0.10/ 10) was required to declare significance after Bonferroni's correction. Secondly, subgroup analysis was performed to investigate the interaction and impacts of those parameters on survivals. Finally, clinical factors and potential PET parameters were subsequently entered into multivariate Cox proportional hazards models. The following variables served as covariates in the training group: age, tumor histology, treatment modality, performance status, and clinical AJCC stages. The proportional hazards assumption for each variable was examined by Schoenfeld residuals test. To minimize overfitting of predictor effects during model development thematic series: advanced image analysis and follow the rule of thumb of multivariate analysis [29], multivariate Cox regression model was analyzed using a bootstrap resampling procedure (1,000 samples).
Recursive partitioning analysis (RPA) was applied to devise a prognostic model based on significant clinical parameters and PET radiomics. To this aim, a classification and regression tree (CART) algorithm was applied, and patients were divided according to independent dichotomous variables. To overcome the potential issue of overfitting, fivefold cross-validations (following a random exclusion of 20% of patients in the training group) were repeatedly used for assessing model accuracy. Finally, the resulting survival model was subsequently applied in the validation group using Kaplan-Meier estimates (log-rank test). The predictive ability of different prognostic variables was then compared using the concordance index (C-index) with AJCC staging, WHO high-risk histology for the entire cohort and clinical nomograms for surgical-treated patients. Comparison of Cindex was conducted by using a nonparametric approach [30] implanted in MedCalc version 19.1 (Mariakerke, Belgium).
The median time to disease progression after treatment was 15.1 months (range: 2.4-69.1 months). Kaplan-Meier analysis revealed that patients with advanced Tstages, AJCC stages, or who were smokers had worse OS and RFS. Surgery and WHO high-risk histology were found to have an adverse impact on RFS, but not on OS. Patients with ACC tended to have a better OS (P = 0.077) but a similar RFS (P = 0.957) compared with other histology types. Among patients who were treated with surgery, positive margins, perineural invasion, and lymphatic invasion were identified in 24 (53.3%), 17 (37.8%), and 1 (2.2%) cases, respectively. The median nomogram score for OS [13] was 132 (range: 0-254). The median nomogram score for RFS [14] was 2.76 (range: 1.0-4.0). Notably, patients with advanced N-stage (N2c/N3, n = 2/2) or poor PS (ECOG 2, n = 3) had 5year OS and RFS rates of 0%.

[ 18 F]FDG PET radiomics and prognosis
There were significant correlations of MATV with almost every radiomic index, including the following: asphericity, discretized intensity entropy, ASM, sum entropy, RLNU, HGLRE, ZSNU, and HGLZE. Supplementary Table 1 depicts the distribution of all PET parameters. The tumor voxels ranged from 22 to 11412 voxels (median: 245, interquartile range: 129 to 1676 voxels). Three cases (4%) had tumor voxels fewer than 64. Tumor SUV max , MATV, asphericity, discretized intensity entropy, ASM, sum entropy, and HGLRE were unaffected by the PET systems. Interestingly, we found that tumor volumes in patients with adenoid cystic  Fig. 1 and Supplementary Table 2, the results of ROC curve analysis in the training group revealed a significant association between tumor SUV max and OS (Cindex: 0.74, P = 0.007 with cutoff value of 6.67). Neither other PET tumor nor lymph node parameter was found to be related to RFS. Subgroup analysis revealed that in the training group, tumor discretized intensity entropy was significantly associated with OS in the subgroup of patients with SUV max > 6.67 (n = 21; C-index: 0.81, P = 0.025; cutoff: 2.50). In contrast, none of the other tumor nor lymph node parameters was related to OS in the subgroup of patients with low SUV max (n = 24).
Based on these findings, we devised the following PET prognostic system for predicting OS: patients in subgroup 1 (n = 24) had a low SUV max ; patients in subgroup 2 (n = 8) had high SUV max but low discretized intensity entropy; and patients in subgroup 3 (n = 13) had both high SUV max , and discretized intensity entropy. Patients in subgroup 3 showed the least favorable survival figures in terms of both OS and RFS (Fig. 1) and were therefore considered as having a high-risk PET pattern. The selection process of PET radiomics was demonstrated in Supplementary Figure 1. We also found a statistically significant inverse association between the presence of ACC and the high-risk PET pattern (ρ = − 0.41, P = 0.005).

Development and validation of the prognostic model
All variables satisfied the proportional hazards assumption in Schoenfeld residuals tests. Bootstrap multivariate analysis based on 1,000 replications confirmed the significance of high-risk PET pattern and ECOG 2 or N2c-N3 status for both OS and RFS, Table 2. There was no significant correlation between different parameters in multivariate analyses of the training cohort. Therefore, these two variables were used to devise a prognostic model-which was developed by means of RPA and fivefold cross-validations. Interestingly, patients with the high-risk PET pattern had adverse outcomes, whereas those in subgroups 1 and 2 could be further stratified according to the ECOG2 or N2c-N3 status (Fig. 2). Finally, a total of 15 patients were classified as having a poor prognosis according to the prognostic model. We attempted to replicate these results in the validation group (n = 30). Figure 3 shows that patients with the high-risk PET pattern (i.e., high SUV max and high discretized intensity entropy) and a positive ECOG 2 or N2c-N3 status had less favorable survival outcomes (both in terms of OS and RFS). The prognostic model effectively predicted both OS and RFS. This prognostic model worked effectively in predicting OS and RFS regardless of the histology type (all P < 0.001 in both the ACC and non-ACC subgroups). Because of the limited sample size (n = 30) and the presence of collinearity between the high-risk PET pattern and both AJCC stage and WHO high-risk histology (ρ = 0.480 and 0.498, P = 0.007 and 0.005, respectively), multivariable Cox proportional hazards regression analysis was not performed in the validation group.
Results of performance of clinical parameters, highrisk PET pattern and the prognostic system in the entire study cohort (n = 75) were demonstrated in Supplementary Table 3. PET radiomics-based model (i.e., subgroup 3 PET pattern) had sensitivity of 63.0%, accuracy of 81.3% with C-index of 0.77 for OS. After integrated with ECOG 2 or N2c-N3 status, sensitivity, accuracy, and Cindex could be improved to 74.1%, 85.3%, and 0.83, respectively. Similar findings were noted for RFS. Although those improvements in sensitivity and accuracy did not achieve significant level (McNemar test, P = 0.250), there was marginally significant in C-index improvement (P = 0.085 and 0.0918 for OS and RFS, respectively).
The application of the prognostic system (n = 75) led to the identification of 24 (32.0%) patients with a poor prognosis. Importantly, the ability of the prognostic index to predict OS (C-index: 0.83) was significantly higher than those of the AJCC stage (C-index: 0.65, P = 0.005) and high-risk histology (C-index: 0.54, P < 0.001). Similar results were observed for RFS, with a higher Cindex being evident for our prognostic model (0.78) compared with other variables (AJCC stage and highrisk histology, C-index: 0.68 and 0.59, P = 0.099 and 0.004, respectively; Fig. 4). Patients with a poor prognosis were more likely to have disease recurrences (odds ratio: 18.18; 95% confidence interval: 5.14 − 64.36, P < 0.001). Three cases with tumor volumes < 64 voxels had lower SUV max (3.89, 4.55, 5.47) and were not classified as high-risk PET patterns. All had survived during the study period without relapse. Remove those three lesions had no impacts on subsequent analyses.
Our prognostic model was further compared with two previously published nomograms for predicting OS [13] and RFS [14]. Although the two nomograms effectively

Discussion
This extended study enrolled MSGC of all kinds of histology. More radiomic data were investigated using the testing and validation method. Nevertheless, the PET parameters we found were identical to our previous paper. The results indicate that tumor SUV max , discretized intensity entropy, and ECOG 2 or N2c-N3 status are independently associated with survival endpoints in patients with MSGC. These variables were used to devise and validate a prognostic model that was clinically useful for improving outcome stratification in this patient group. Notably, our prognostic model was found to have a higher predictive power as compared with AJCC staging, WHO classification, and currently available nomograms. Patients identified as having a poor prognosis according to our model are unlikely to benefit from current treatment modalities and are ideal candidates for novel treatment approaches in clinical trials.
To our knowledge, this study is the first to specifically focus on the prognostic significance of PET radiomics in MSGC. Owing to the robustness and high reliability of SUV max and discretized intensity entropy obtained from different PET imaging systems [25,26], these parameters have the potential to serve as image biomarkers in multicenter clinical trials. In our study, tumor SUV max was identified as an independent prognostic factor in patients with MSGC. This finding may be explained by the positive correlation between SUV max values and the extent of glucose transporter type-1 (GLUT-1) expression in tumor tissue [31]. An increased expression of GLUT-1 reflects a high biological aggressiveness of salivary gland malignancies and has been associated with poor clinical outcomes [32,33]. Conversely, ACC-which generally shows a better OS compared with other histology types-is characterized by a reduced GLUT-1 expression [34] paralleled by lower SUV max values. Genomic and phenotypic tumor heterogeneity may influence both response to therapy and clinical outcomes [35,36]. In this scenario, discretized intensity entropy-which can reflect the spatial heterogeneity of GLUT-1 expression-has been shown to predict survival in patients with head and neck cancer [37] and pancreatic cancer [38]. Discretized intensity entropy can also be a reflection of the tumor microenvironment (e.g., vascularization, cell density, biological aggressiveness, and hypoxia) [39,40], which may in turn influence disease recurrences in MSGC. The extent to which PET heterogeneity does actually reflect biological heterogeneity at cellular, molecular, and genetic levels has not been completely understood. For example, PET heterogeneity indices do not show significant associations with the mutation burden and genetic heterogeneity in lung cancer [41]. It is therefore possible that such PET parameters may actually have an independent prognostic significance.
Recent studies have shown that the number of positive lymph nodes-but not lymph node size, extranodal extension, and lower neck involvement-can predict survival outcomes in SGC [42,43]. Because an N2c-N3 status likely reflects a higher number of nodal metastases, the adverse prognostic impact of this variable as observed in our study is not surprising. The unfavorable prognostic significance of poor PS is also in line with a previous study conducted in SGC [44]. Interestingly, we were also able to confirm the utility of two previously described nomograms (originally developed for major salivary gland tumors) in the prediction of clinical outcomes in MSGC.
Although radiomics-based clinical-decision-support systems (CDDS) are evolving rapidly, significant challenges still remain (including an unambiguous data collection and an accurate integration of different features) [19]. In keeping with the principles summarized in the radiomics quality score, the present study was designed to (1) validate the findings originally obtained in a testing group, (2) include multivariable analysis in the statistical methodology, and (3) compare the predictive ability of our model with that of currently recognized prognostic factors. Moreover, all of the PET images were acquired using standard imaging protocols and the cutoff values for SUV max and discretized intensity entropy were identical to those reported in a previous study [17]. Notably, the application of our model led to the identification of a subgroup of MSGC patients that is unlikely to benefit from current therapies. Despite these strengths, our findings should also be interpreted in the context of some limitations. First, the retrospective, single-center nature of our extended study is prone to bias and limits the generalizability of our conclusions. Consequently, our findings need to be confirmed and expanded in larger independent, multicenter investigations. However, the rarity of MSGC still severely hampered large scale radiomics investigations. Second, we enrolled three cases (4%) with smaller tumors (< 64 voxels). Although there was no influence on subsequent analysis, the texture indices may provide more valuable complementary information for tumors with larger volumes [45]. Third, two PET systems were involved in this study. Although SUV max and discretized intensity entropy were insensitive to device in our study, the variance of voxel size may considerably influence the agreement of measurements [20,46]. Therefore, harmonizing voxel sizes may have the potential to remove the device-effect although more studies are needed to state its effects with certainty.

Conclusions
We devised a novel prognostic model that incorporates [ 18 F]FDG PET/CT radiomics and may help refine outcome prediction in patients with MSGC. Fig. 4 Receiver operating characteristic (ROC) curves showing that the C-index of the prognostic model was higher than those of AJCC stage III − IV and WHO high-risk histology for both OS (a) and RFS (b). For surgically treated patients (n = 45), the C-index of the prognostic model was higher than the nomogram for OS (c) and that for RFS (d)