Quantitative 18F-FDG PET-CT scan characteristics correlate with tuberculosis treatment response

Background There is a growing interest in the use of F-18 FDG PET-CT to monitor tuberculosis (TB) treatment response. Tuberculosis lung lesions are often complex and diffuse, with dynamic changes during treatment and persisting metabolic activity after apparent clinical cure. This poses a challenge in quantifying scan-based markers of burden of disease and disease activity. We used semi-automated, whole lung quantification of lung lesions to analyse serial FDG PET-CT scans from the Catalysis TB Treatment Response Cohort to identify characteristics that best correlated with clinical and microbiological outcomes. Results Quantified scan metrics were already associated with clinical outcomes at diagnosis and 1 month after treatment, with further improved accuracy to differentiate clinical outcomes after standard treatment duration (month 6). A high cavity volume showed the strongest association with a risk of treatment failure (AUC 0.81 to predict failure at diagnosis), while a suboptimal reduction of the total glycolytic activity in lung lesions during treatment had the strongest association with recurrent disease (AUC 0.8 to predict pooled unfavourable outcomes). During the first year after TB treatment lesion burden reduced; but for many patients, there were continued dynamic changes of individual lesions. Conclusions Quantification of FDG PET-CT images better characterised TB treatment outcomes than qualitative scan patterns and robustly measured the burden of disease. In future, validated metrics may be used to stratify patients and help evaluate the effectiveness of TB treatment modalities.

on health resources and likelihood of non-adherence. In the literature, the reported rate of unfavourable outcomes varies considerably; however, it usually ranges from < 5 to 19% in trials [10][11][12][13][14][15] to over 20% in national health program conditions [1,16]. Unfavourable treatment outcomes include failure to convert to sputum culture negativity, treatment default, and disease recurrence, which could be due to either endogenous relapse or exogenous reinfection.
There is considerable effort to improve treatment outcomes, shorten treatment duration, and reduce disability. Nevertheless, testing new antibiotic regimens or immunotherapy options is hampered by the long duration of treatment and follow-up, required because there is no gold standard to determine sterilising cure. Factors contributing to the uncertainty of defining sterilising cure include the persistence of radiological lung lesions [17,18], clinical symptoms, and Mycobacterium tuberculosis (MTB) DNA in sputum [19,20] after clinical cure. These attributes may persist in spite of sputum culture negativity. Clinical treatment programs, researchers, and investors in new therapies urgently require improved methods to better define TB treatment response. This need has triggered an increasing interest in using 18-F fluorodeoxyglucose positron emission tomography-computed tomography (18-F FDG PET-CT) as a research tool in tuberculosis. Due to its high sensitivity for metabolic activity in infectious lesions, it has shown the potential to be a powerful and possibly cost-effective tool in TB trials, despite the reported lack of specificity in diagnosing active TB in high-incidence areas and its dependence on expensive resources [21,22].
FDG PET-CT is relatively non-specific for TB, since malignancies and other inflammatory pathology demonstrate similar FDG uptake. Although this limits its use as a TB diagnostic tool in high burden settings [23], its high sensitivity for TB lesions makes it an attractive option to monitor treatment response, once a diagnosis is already established. Firstly, CT is more accurate than traditional chest X-ray for correctly identifying most lesion types associated with TB, especially small nodules, cavities, bronchial thickening, and tree-in-bud lesions [24][25][26]. Secondly, the addition of PET to CT is reported to further improve sensitivity by identifying small lesions, affected lymph nodes, and helping to distinguish active from inactive lesions [23,[27][28][29].
Several animal infection models (mice, rabbits, nonhuman primates) have effectively used FDG PET-CT to shed light on TB progression to disease and response to treatment [29][30][31][32][33][34]. FDG avidity decreases in lung lesions of MTB-infected animals receiving anti-TB treatment. The reduction in FDG avidity is initially slow (first week of treatment), followed by a sharp decrease in avidity (week 4) after which it stabilises. In untreated animals, FDG intensity shows a variable correlation with MTB load (CFU) and a strong correlation with the lesion size. FDG avidity reduction often precedes reduction of lesion volume and density on CT [29,31]. The overall reduction in FDG uptake over treatment time correlates with the effectiveness of the bactericidal activity of different treatment options [33].
In mouse models, FDG PET-CT is also able to detect the development of relapse prior to microbiological evidence [30]. Monitoring of the spatial evolution of PTB lesions preceding relapse indicates that there is both progression of existing pre-treatment lesions and the formation of new lesions [35].
Human studies have also shown FDG PET-CT to be promising in monitoring the effect of treatment in pulmonary and extra-pulmonary TB [22,28,[36][37][38][39][40][41]. While most of the studies used simple descriptive techniques, two small trials aimed at the treatment of drug-resistant TB, implemented whole lung quantification of PET (using fixed thresholds) and semiquantified CT reader scores. These studies concluded that quantified PET images were more robust than reader-based CT scores, and both seemed to accurately measure changes in disease burden over time [42,43].
We recently reported imaging findings in TB patients who underwent FDG PET-CT scans at baseline, during, and after treatment (Catalysis treatment response cohort) [44]. We documented strikingly complex and heterogeneous lesion responses. During treatment, a decrease in size and FDG avidity was noted in most lesions. Unexpectedly, we did however find lesions that appeared metabolically active, with morphology in keeping with active disease in a substantial proportion of PTB patients after standard treatment, including patients with a durable cure and others who later developed recurrent disease.
In this report, we apply quantitative scan assessment by semi-automated whole-lung analysis. We show that these metrics are strongly associated with clinical outcomes, patient factors, and microbiological outcomes. Further, we discuss which identified scan characteristics appear most meaningful for both the interpretation of treatment response and the separation of favourable from unfavourable treatment outcomes. This information is drawn from over 338 scans from 96 patients and points towards the most meaningful metrics in the complex scan profiles of TB treatment response. Some metrics already show prognostic potential at diagnosis, while others that track changes over time become more meaningful at the end of treatment.

Recruitment and study procedures
Participants considered for this report were 99 newly diagnosed, culture-confirmed, pulmonary TB patients who successfully completed follow-up as part of the previously published Catalysis treatment response cohort [44][45][46]. They were HIV-uninfected adults, recruited at primary health care clinics in the northern regions of Cape Town, South Africa. Patients underwent FDG PET-CT scans at diagnosis (Dx) and at month 1 (M1) and month 6 (M6) of treatment. Fifty patients also had FDG PET-CT scans 1 year after the end of treatment (EOT + 1y). PET images were corrected for attenuation and reconstructed to 4 × 4 × 4 mm voxels using an iterative algorithm. The CT scan parameters were set at 120 kV, 100 mAs, without dose modulation with 1.17 × 1.17 mm pixels, and a 3mm slice thickness, reconstructed with I31 filter and B31 s con kernel. Figure 1 shows a flow diagram of the study design and scan settings as described in Additional file 1: Supplementary note 1.
Clinical samples and information were collected at day 0, week 1, 4, 8, 12, and 24 (month 6) for all participants. Samples included liquid culture with speciation and GeneXpert® MTB/Rif (Xpert) assays on sputum and the analysis of multiple biomarkers in blood and urine.

Qualitative scan assessment
We previously conducted and reported qualitative scan assessments by comparing each lesion's intensity at M6 to the intensity at Dx. 46 Three different response patterns were described: (1) A 'resolved' scan response pattern showed no lesion with more than minimally increased FDG intensity when compared to surrounding lung tissue M6. (2) An 'improved' pattern indicates that all lesions improved during treatment, but one or more lesion showed residual FDG avidity at M6. (3) A 'mixed' Fig. 1 Flow diagram of study design and participants included in analysis response indicated that while some lesions improved, at least one intensified, or a new lesion was present at M6. EOT + 1y scans were compared to M6 scans for qualitative classification.

Quantitative scan assessment
Based on a previously described methodology [47], we quantified the extent and severity of lung lesions concurrently on PET and CT for all scans (Dx, M1, M6, EOT + 1y). After co-registration of scans across time points using the SPM toolbox [48] in MATLAB (Mathworks Inc.), we created volumes of interest (VOIs) of lungs on the CT component with MRICro [49]. The lung VOIs were adapted by excluding areas affected by misregistration and created to fit all time points. In some cases, we had to create a separate lung VOI for the EOT + 1y scan, due to substantial lung volume changes related to fibrosis or poor inspiratory effort. In addition, we created VOIs which appeared lesion-free on both PET and CT on all time points to represent references for background FDG uptake in the lung.
We segmented the PET component by using a lesionto-background comparison. To reduce intra-and interscan variability, we standardised uptake using patientspecific reference volumes. We assigned a Z-score to each voxel based on: in which μ NL and σ NL are the mean and standard deviation of PET counts within the lesion-free lung VOIs for each scan. All voxels exceeding a Z-score of 8 were segmented as FDG-avid [47].
We used a previously reported method of density thresholding to segment lesions on CT [42][43][44][45][46][47]. These thresholds were as follows: (1) normal density, between − 950 Hounsfield units (HU) and − 500 HU; (2) soft lesions (V soft ), from − 500 to − 300 HU, usually tree-in-bud lesions or nodules, but may also include regular, medium, to large vasculature; (3) medium density lesions (V medium ) from − 300 to − 100 HU, which usually consists of nodular infiltrates, but may also include established lesions in early progression or partial resolution; and (4) hard lesions (V hard ), above − 100HU, are usually due to consolidation, cavity walls, bronchial thickening, or calcified fibrosis. We delineated cavity air volume using a gradient-based region-grow technique, and on the M6 scan measured the thickness of cavity walls at the level of the widest diameter on the transverse view. We measured the wall thickness of enclosed cavities at the level of widest cavity diameter, and the area of maximum wall thickness where there was no confluence with other lesions and structures, or fibrotic changes (examples shown in Additional file 1: Figure S1).
In addition, the program also measured the volumes of each abnormal density category on CT, i.e. V soft , V medium , V hard , and total volume with abnormal density > − 500 HU (V total ). We also measured a combined FDG PET-CT metric: MLV abN = the intersection of MLV and area with increased density on CT (≥ − 500 HU).
To create a variable to combine all major contributing factors on PET and CT, we assigned the Z mean score to the cavity volume (with no perfusion, thus no FDG uptake). We then added this to the TGAI value to obtain a composite measure of both metabolically active lesions and cavities. The resulting formula was: We previously published further detail regarding quantification and evaluation of the described technique [47].

Statistical testing
We considered a P value smaller than 0.05 as significant. We tested the association between categorical and continuous variables with a two-way T test for independent samples (Tibco© Staticstica™ V13). For analysis of variance tests with multiple grouping variables (TTN grouping variables), we applied the more conservative Kruskal-Wallis non-parametric test (R version 3.2.2). We calculated P values for receiver operating curves in order to test the null hypothesis that the area under the curve equals 0.50 (GraphPad© Prism V8). We performed post hoc analysis to distinguish favourable and pooled unfavourable outcomes by applying thresholds suggested by receiver operating curves. We used the Fisher exact test to determine significant associations between categorical variables (GraphPad© Prism V8). In this descriptive study, we did not correct for multiple testing.
We use the standard terms prognostic and predictive when comparing the association between FDG PET-CT parameters and outcomes. While the ability of a M6 marker to identify failed cases is strictly speaking diagnostic since it applies to the same time point, in practice, culture results are delayed and failed and relapse cases are grouped together when assessing treatment efficacy.

Patient demographics and treatment outcome
We recruited 99 PTB patients, of which 95 had drugsensitive (DS) strains, 2 had isoniazid mono-resistant strains, and 2 multi-drug resistant strains. More details regarding the treatment regimens are provided in Additional file 1: Supplementary note 2.
We based patient clinical outcome classifications on WHO definitions, except we used the more sensitive sputum culture, instead of direct smear microscopy. The outcomes for 3 patients were classified as un-evaluable (UE) due to sputum culture contamination, and they were excluded from the analysis. Of the remaining 96 participants, favourable treatment outcomes include 76 cured cases (achieved and maintained culture conversion). Unfavourable outcomes included 8 failed treatment cases (sputum culture positive at M6), and 12 with recurrent PTB (initially culture converted, but rediagnosed with active TB within 2 years after treatment completion). One of the 8 failed treatment cases was asymptomatic in spite of a positive sputum culture at M6 and declined to restart treatment and remained symptom-free when assessed a year later. Of the patients with recurrent PTB, 2 were culture confirmed; 5 were confirmed by both Xpert and smear positivity by direct microscopy (acid-fast bacillus positive); 3 were Xpert negative at month 6 but converted back to positive; and 3 remained Xpert positive for more than 6 months and deteriorated clinically. The absence of post-treatment culture complicated the recurrence diagnosis and prevented the distinction between relapse and reinfection.
Outcome was further stratified based on time to culture conversion and treatment adherence. Eighteen patients converted to sputum culture negative within 4 weeks, an additional 39 by week 8, another 22 by week 12, and a further 9 by week 24. Time to culture negativity (TTN) was un-evaluable (UE) for 3 patients due to contaminated cultures. Fourteen patients took fewer than approximately 80% of their treatment dosages during the 6-month period, which is regarded as poor adherence in most clinical trial designs. The failed treatment group included 4 patients with poor treatment adherence and 1 with MDR disease [12,13]. Further clinical information and demographics of the cohort may be found in our previously published online methods [44].
We performed a fourth scan 1 year after the end of treatment (EOT + 1y) for 50 patients that cultureconverted at M6. Eight of these 50 patients were diagnosed with recurrent disease by healthcare providers within 2 years of treatment completion (five before the EOT + 1y scan and three after). The other 42 maintained favourable treatment outcome status.

Qualitative FDG PET-CT results summary
The scans showed ongoing inflammation at the end of treatment in the majority of the patients [44]. For 51 (52%), there was an improved response on the M6 scan ( Fig. 2a, Fig. 3a). A mixed response was seen in 34 (34%) patients (Fig. 2b, Fig. 3b), of which 14 had both new and more intense lesion(s), 16 demonstrated only an increase in the intensity of lesion(s), and 4 had only new FDGavid lesion(s). Only 14 (14%) patients had a resolved pattern on their M6 scan (Fig. 2c).
The morphology associated with the most intense lesion of each mixed and improved M6 scan included CT features suggestive of active PTB, such as cavities (in 26 cases), patchy consolidation (in 22), complex lesions involving consolidation with cavitation (in 16), nodular infiltrates (in 17), enlarged hilar lymph nodes (in 3), and pleural-based infiltrates (in 1). Smaller nodules and treein-bud-lesions without calcification tended to resolve during treatment, especially when present in the lower lobes, and even if they were diffuse. Results for each patient are included in Additional file 2: Dataset 1.
Quantitative FDG PET-CT characteristics in relation to sputum time to culture negativity Lesion burden was significantly associated with TTN for the three main independent FDG PET-CT parameters (total cavity volume, TGAI, V total ) at Dx, M1, and M6 (Fig. 4a, c, e). The differentiation between the TTN groups became more pronounced during treatment. Cavity volume showed the largest difference between TTN groups at single time points (P < 0.001 at Dx, M1, and M6 - Fig. 4c). Proportional TGAI changes (P = 0.031 at M1; P = 0.002 at M6 - Fig. 4b) from baseline (delta), were also significantly associated with TTN. Similar trends were noted for cavity volume and V total (Fig. 4d, f), but did not meet the threshold for significance. Delayed sputum converters (between months 5 and 6) and failed treatment cases thus showed both a larger burden of disease and a slower rate of reduction in scan metrics. The recurrence group also showed a slower rate of reduction in scan metrics, but did not have a large baseline burden of disease (Fig. 4d).
We also evaluated the individual components of the TGAI (Z mean , the SUV max , MLV -Additional file 1: Figure S3), and high-density lesions on CT (V hard , V medium , V soft -Additional file 1: Figure S4), as well as the intersection of high-density lesions on CT and FDG-avid lesions on PET (MLV abn -Additional file 1: Figure S5). We found a significant correlation between TTN groups and single time-point values for indicators of lesion volume (MLV, V hard , V medium , V soft , MLV abn ), but not for indicators of PET intensity (Z mean , SUVmax). The proportional change from Dx to M6 in these variables was significantly associated with TTN groups for all these variables (Z mean , MLV, V hard , V medium , V soft , MLV abn ) except SUV max . None of these variables, however, showed a clear advantage over the main independent FDG PET-CT variables (cavity volume, TGAI, V total ).  Figure S6). Apart from cavity volume, other Fig. 3 FDG PET-CTs for three representative cases that received 6 months of standard treatment. Three-dimensional anterior and transverse slices at the level of horizontal blue line. a Bilateral upper lobe cavitation at Dx, which demonstrate increased intensity at M1. At M6, the left cavity has changed to fibrotic tissue with mild uptake, but the right cavity still has a thick wall and high uptake. b Failed treatment case with bilateral upper lobe cavities that retain very high intensity at M6. c Case diagnosed with recurrent disease subsequent to EOT + 1y scan. All lesions improved to moderate intensity uptake at M6, but a large new area of patchy consolidation is seen at EOT + 1y metrics reflecting lesion extent (V total , MLV, MLV abn ) also showed promise to differentiate failed cases from cured at Dx, M1, and M6, while parameters reflecting the intensity of FDG uptake (SUVmax, Z mean ) did not show prognostic value at baseline, but only at M6. A summary of AUC's for various scan parameters to differentiate failed treatment cases is in Additional file 1: Table S1. Of note, the one asymptomatic, failed treatment case (participant identification number 43) had quantified values in keeping with a good response to treatment.
In addition to cavity volume, M6 cavity wall thickness was also associated with treatment failure. In most cured cases, M6 cavity wall thickness ranged from 0 (no cavity) to 3 mm. M6 cavity wall thickness in failed cases was significantly greater than cured cases' (Student's T test for independent samples; P < 0.001) and ranged from 2.5 to 8 mm. At end of treatment, recurrent cases were not significantly different from other cured cases; their M6 cavity wall thickness ranged from 0 to 4 mm.
Treatment outcome was also associated with the qualitative scan response pattern (Fisher's exact test; P < 0.01) and showed high sensitivity, in that a mixed response was found in all failed patients at M6. However, neither a mixed response nor a high maximum lesion intensity was specific for an unfavourable outcome, and 21 (28%) of cured patients had a mixed response, while 55 (72%) still had M6 lesions with moderate to  very high intensity. This was similar to the intensity range seen in some untreated cases at diagnosis.

Scan characteristics of recurrence
The 12 patients diagnosed with recurrent disease within 2 years after treatment had a similar sputum culture conversion rate to cured cases (median TTN 8 weeks) and did not show a comparatively large lesion burden at Dx (Fig. 4). Nevertheless, irrespective of TTN, during treatment they exhibited a relatively slow rate of reduction in TGAI, cavity volume, and to a lesser extent V total (Fig. 4b, d, f).
At M1, there was a trend for the recurrent disease group to have a smaller reduction in TGAI and TGAIcom burden. At M6, the difference between the groups was significant (P = 0.003). No other parameters were significantly different between cured and recurrent disease groups.
Patients who reported previous PTB episode(s) tended to have a higher TGAI burden at Dx and showed significantly less TGAI reduction on treatment (P = 0.003, Additional file 1: Figure S7). They also showed less reduction in cavity volume, but no clear difference in abnormal CT density (V total ). See Additional file 1: Table  S2 for additional summary statistics on previous TB.

Scan characteristics of pooled unfavourable outcomes
We pooled patients with unfavourable outcomes (failed and recurrent treatment) and analysed the most promising scan parameters' distribution per groups, combined with receiver operating curve analysis to determine the most informed thresholds. A failure to reduce TGAI by less than 80% from Dx to M6 was the scan characteristic most associated with unfavourable outcomes and carried an almost sevenfold risk. Table 1 shows indicators of M6 scan parameter associations with unfavourable outcomes and the suggested cut-offs, and Fig. 5 compares the distribution of scan metrics for outcome groups.
A total M6 cavity volume greater than 7 ml and a M6 TGAI of greater than 600 (equivalent to a SUV-based total glycolytic activity of roughly 200 if calculated using SUV) also carried a fourfold increased risk of an unfavourable outcome. Cavity volume was slightly more sensitive and TGAI more specific in predicting unfavourable outcomes. Combining the variables did not improve prognostic accuracy, either when merged into a single variable or when used in Boolean selection. TGAIcom performed very similarly to TGAI. We also tested whether either M6 cavity volume > 7 ml or TGAI > 600 put a patient in the high-risk group, but this generated the same sensitivity but lower specificity than single variables.
Quantitative parameters at M6 out-performed lesionbased qualitative measurements. A mixed response pattern (either new or intensified lesions) at M6 was associated with a 2.86 times increased risk of unfavourable outcome, which was comparable to the prognostic potential of the quantitative parameters at M1. At M1, both a cavity volume greater than 20 ml and a less than 33% cavity volume reduction from Dx were significantly associated with unfavourable outcome, showing a 2.5and 2.8-times increased risk of unfavourable outcome respectively (P = 0.04 and P = 0.01 respectively). Total TGAI values at Dx and M1 did not perform well as a predictor of pooled unfavourable outcomes, but a trend (P = 0.07) suggests an increased risk of unfavourable outcome when there is less than 5% reduction in TGAI (from Dx to M1).

Scan results: EOT + 1y
Most residual lesions were smaller and less intense 1 year after the end of treatment. Metabolic lesion volume decreased to an average of only 2.03% of lung volume at EOT + 1y, compared to 4.21% at M6 in the same patients. Mean total cavity volume also decreased from 7.6 ml at M6 to 2 ml at EOT + 1y. Abnormal CT density showed less reduction during treatment than other parameters after treatment, and the mean V total at EOT + 1y was 4.53%, compared to 5.7% at M6. Figure 6 shows the distribution of TGAI and cavity volume across time points.
Remarkably, only 32% of EOT + 1y scans were completely resolved. The remaining 68% had FDG-avid residual lesions, of which half had improvement of all lesions compared to M6 (Fig. 2a), and the other half had a mixed lesion response compared to the M6 scan (Fig. 2b, c, Fig. 3c). Morphology of new FDG-avid lesions at EOT + 1y included nodular infiltrates (found in 4 cases), hilar lymph nodes (in 1), cavitation (in 2), consolidation (in 2), or lesions with combined morphology (in 3). There was no association between the development of new lesions during Dx-M6 and during M6-EOT + 1y. Morphology of residual M6 lesions showing similar or more intense FDG uptake at EOT + 1y included consolidation (2), cavitation (4), and nodules (2). All three patients who developed recurrent PTB after EOT + 1y had mixed scan outcomes at this time point (Fig. 2c), while none with resolved EOT + 1y scans were diagnosed with recurrence. Figure 2 and Fig. 3 show examples of dynamic lesion progression and resolution during and after treatment.
We found no significant association between M6 and EOT + 1y for TGAI or cavity volume (Additional file 1: Figure S8a and S8b). However, we found a moderate correlation between the time points for V total and SUVmax (Additional file 1: Figure S8c and S8d).

Summary of main findings
Quantification of the FDG PET-CT images provides metrics that show stronger association with clinical outcomes compared to qualitative scan patterns. Qualitative scan response patterns are more challenging to interpret, due to varying responses of individual lesions and incomplete resolution of inflammation during treatment. The most promising quantitative marker (TGAI not reducing by > 80% from Dx to M6) carried a 6.97 relative Various scan metrics measured in this study showed prognostic potential at Dx and M1 and stronger associations with unfavourable outcomes by M6. A high cavity volume showed the strongest association with a risk of treatment failure, while a suboptimal reduction of the total glycolytic activity throughout the lung had the strongest association with recurrent disease. Both of these variables also correlated with time to culture negativity. This suggests a correlation between the quantified lesion burden and the MTB load that is clearer when using quantitative rather than qualitative analysis.
The volume of high-density lesions on CT (V total ) also shows a strong association with TTN and failed treatment, even at early time points (Dx and M1). Unlike TGAI, however, V total shows no association with recurrent disease and subsequently, pooled unfavourable outcomes. This is likely due to residual scarring and fibrotic changes and the related residual abnormal density lesions on CT after treatment. Values indicating FDG uptake intensity alone (SUVmax and Z mean ) do not show association early in treatment. However, the proportional intensity changes are associated with TTN and outcome, though not as strongly as TGAI, which combines information from intensity and volume. Combined PET and CT parameters perform similarly to their underlying components, but they do not appear to be clearly superior to individual variables.
Quantification of EOT + 1y scans confirms our previous observations that there is a tendency for all parameters to decrease after treatment, but that a lack of complete resolution is still common and new or intensifying lesions are often seen. Dynamic changes after treatment are common for PET parameters resulting in a poor correlation between M6 and EOT + 1y measurements, compared to CT lesions which appear to be more persistent after treatment.
We found notable differences between failed and recurrent treatment cases. Failed treatment is associated with extensive lung lesions at baseline and large cavities with thick walls at M6, as well as poor adherence. On treatment, a reduction in the FDG avidity and thickness of cavity walls is usually also associated with a reduction in cavity volume. Interestingly, it is relatively common for cavities to show a reduction in both FDG avidity and wall thickness (thus appearing inactive), but to show an increase in size after M1. This may reflect a loss of structural wall strength and progress towards the formation of bullae (Additional file 1: Figure S1b). Recurrent cases display a comparatively low lesion burden at baseline, average adherence, and time to sputum culture negativity, but insufficient reduction in lesion burden during treatment. We also found insufficient reduction in lesion burden in patients with a history of previous PTB episodes.

Comparison with previous literature
The catalysis treatment response cohort is the largest prospective study conducted on the use of FDG PET-CT in human patients with PTB and the first report on the fate of residual FDG PET-CT lesions after PTB treatment [45]. In this report, we found that a quantitative analysis of scan characteristics shows a stronger association with outcomes than a qualitative analysis of these same characteristics. In related publications, these quantitative metrics also correlate well with host biomarkers, namely gene expression signatures [46], and urinary concentration of the recently discovered metabolite, serylleucine core 1 O-glycosylated peptide [50]. The potential of quantitative FDG PET-CT variables to identify patients with a low risk of treatment failure was also analysed in combination with other patient variables and is currently being tested in the PredictTB trial [51].
Our quantitative findings correspond well with previous reports on animal models [29][30][31]52] and validate findings from two studies in drug-resistant TB cases, which also show that the quantified inflammation burden, as measured by FDG, corresponds with the effectiveness of treatment [41,43]. Our results are consistent with those of previous reports in humans in which cavitary disease is associated with an unfavourable outcome [11,53,54]. The persistence of density changes in the lungs is also in keeping with reports of the high incidence of post-tuberculosis lung impairment [6,8]. We found no published reports in human tuberculosis that compare as many quantified parameters in either PET or CT scans for a sample size this large or suggest cutoff values that may be used for direct comparison in future studies.

Study limitations
Although this is the largest prospective cohort of FDG PET-CT in TB treatment response, it is still a limited sample size with a small number of unfavourable outcomes, and we did not have sufficient data to differentiate relapse from recurrence. On account of these limitations, we did not perform multivariable logistic regression analysis, due to both the risk of false-positive findings from overfitting a model and false-negative findings due to confounding unfavourable outcomes by reinfection cases. In the absence of a ground truth, we also cannot exclude the possibility that other analysis methods could improve on the prognostic ability of scan characteristics or that different PET-CT scanner models, acquisition, and reconstruction protocols will affect the results. With regard to EOT + 1y scans, the variation in timing between the scan and the recurrent disease diagnosis limited the conclusions we can draw from the data. The study design excluded HIV-infected participants to ensure a more homogenous group for biomarker discovery. As such, the suitability of the variables and cut-offs will have to be re-evaluated in this important subset of TB patients. We did not perform pharmacokinetic studies.

Implications
At month 6, the best indicator of unfavourable outcomes (treatment failure and recurrences) shows 80% sensitivity and 75% specificity, which is modest for a diagnostic test but far superior to currently used predictive biomarkers for poor treatment outcomes, such as month 2 sputum culture conversion or AFB smear conversion. The thresholds defined in this report require further validation before direct application to clinical practice, but the associations with quantified patient and microbiological data suggest that most prominent FDG PET-CT scan characteristics can be used as part of risk stratification and treatment response monitoring in therapeutic trials. In combination with clinical evidence, it may also assist treatment decisions in complicated clinical cases, such as treatment of drug-resistant TB and evaluation of adverse reactions to medication. Quantification of central trends in lesion burden provides continuous variables that allow multiple options for statistical analysis. This approach holds promise for improving the accuracy of clinical reporting.

Future research
Quantification methods should be further improved to be less operator-dependent, more user-friendly and widely available to both researchers and clinicians. Suggested thresholds should be validated and tested in shortened TB regimens. Further translational research may implement FDG PET-CT scan characteristics to explore complex interactions between host, MTB, and anti-tuberculous drugs, to help develop improved regimens, host-directed therapy, and diagnostic tests.

Conclusions
Quantification of FDG PET-CT images better characterised TB treatment outcomes than qualitative scan patterns and robustly measured the burden of disease. This approach requires validation in future studies.