Absolute quantification of SPECT images is an old dream but became clinically feasible only very recently, thanks to the introduction of commercial systems which combine SPECT-CT technology and fast 3D reconstruction algorithms with attenuation and scatter corrections and resolution recovery. In this study, we have considered the General Electric Infinia Hawkeye 4, Philips Brightview XCT, and Siemens Symbia T6 SPECT-CT cameras, which have been on the market for several years, but we also looked at the very recently introduced General Electric Discovery NM/CT 670. The manufacturers’ 3D iterative reconstruction with attenuation, scatter, and resolution corrections was systematically used. The manufacturers’ default parameters for these corrections were systematically used while the impact of the number of iterations was studied. Attenuation and scatter accuracy, contrast recovery of hot and cold regions of different sizes, and finally, quantification using three calibration phantoms of different sizes have been analyzed. To the best of our knowledge, this is the first homogeneous comparative study between the four state-of-the-art SPECT-CT systems of three major nuclear medicine vendors.
Attenuation and scatter correction accuracy
The first step in the quantification of nuclear medicine images is clearly a correction for the attenuation and scatter of the emitted photons [9, 10]. It was therefore worthwhile to first assess the accuracy of the attenuation and scatter corrections applied in the four systems. For that purpose, the NEMA NU2-1994 methodology was adopted. Although primarily developed for PET, this methodology is perfectly applicable to SPECT. In contrast to PET reconstructions, the SPECT manufacturer's 3D reconstructions did not generally allow reconstruction with calculated attenuation correction (the Chang method, for example). Therefore, only the combined attenuation and scatter correction accuracy could be evaluated.
The use of Teflon insert could be questioned. Indeed, it has been demonstrated with PET-CT that the HU conversion laws used for low-density material and bone does not fully apply to Teflon . This is mainly due to the large differences in physical effect leading to photon attenuation. Indeed, the photoelectric effect, together with Compton scattering, contributes to X-ray photon attenuation, whereas 511-keV photon attenuation almost results from Compton scatter alone. However, attenuation correction using CT data and bilinear conversion of HU in linear attenuation coefficients have been largely validated for PET-CT, and non-biological materials are increasingly present in scanned patients. In this sense, the use of the Teflon insert was not considered as a limitation of the study but merely as an add-on. For example, Shcherbinin et al.  also used a Teflon insert to mimic the lumbar spine in their investigation of the quantitative potentialities of Infinia Hawkeye 4.
The air, water, and Teflon inserts of the NEMA NU2-1994 phantom are cold compartments. When scatter correction was applied (Figure 1), the residual fractions decreased with the increase of the number of iterations and reached values below 4% at 30 iterations for all systems. Without scatter correction (Additional file 1), the residual fraction in water and Teflon remained stable after about ten iterations but still continued to decrease in the air insert. Scattering in air is expected to be very low, and therefore, the air insert should approximately correspond to a perfect cold region, whether the scatter correction is being applied or not. On the contrary, the more dense water and Teflon inserts should only behave as a perfect cold region when the scatter is corrected for. Convergence of iterative reconstructions is known to depend on the local contrast and is expected to be the slowest for the coldest regions. This is exactly what is observed in air with or without the scatter correction and in the two more dense media when scatter correction is applied.
Without scatter correction, residual fractions in water and Teflon were system-dependent, with differences of up to 5% between Infinia and Discovery. For Brightview, the residual fraction in air was even higher with scatter correction (RF ≈ 3%) than without the correction (RF ≈ 1%). Differences in scatter contamination between camera models have recently been reported in a multi-centric study . The most striking conclusion nevertheless is that despite the use of three different scatter correction techniques, all the systems achieved, in the three cold inserts of the NEMA NU2-1994 phantom, approximately identical and very low (≤4%) residual fractions at 30 iterations.
The linear attenuation coefficients were very close to the expected value for water but were generally lower than the expected value for Teflon. As already mentioned above, this could result from the HU conversion laws that are tailored to biological materials. The under-correction for attenuation of Teflon could explain the lower fractional residues observed in this insert. The values of the air linear attenuation coefficient are not reported in detail. They ranged from 0.0000001/cm (Brightview) to 0.001/cm (Symbia). It is evident that small differences in HU calibration and/or the difference between the HU conversion laws used can lead to large differences in the measured value of the very low air linear attenuation coefficient. However, the values are so low that the attenuation correction is almost not affected by their accuracy. For Brightview, the CT protocol (fast or slow, high or low current) seemed to not influence the results, at least for a phantom with the size and the composition of the NEMA NU2-1994 attenuation and scatter accuracy phantom.
The contrast part of this study was conducted to obtain an estimation of the object size below which quantification would unavoidably be corrupted by the partial volume effect. The use of rods for assessment of contrast recovery with 3D reconstructions could be questioned. A sphere phantom was considered unpractical in the context of the present study performed on six systems belonging to five different departments and with some time limitation in camera availability. Indeed, a sphere phantom is much more fragile than a rod phantom, and the filling procedure is clearly longer. Moreover, the experiment would have been repeated at least two to three times to keep the noise variability sufficient low. The rod phantom allowed the summing of the results obtained in several slices which, combined with a high number of acquired counts, helped to reduce noise variability. A definite advantage of the contrast phantom is its ease and low cost of manufacture. The contrast recovery coefficients obtained with a sphere phantom would depend on sphere and background activities, sphere to background contrasts, and the number of total acquired counts . The rods offer the opportunity for infinite contrast, and this could be seen as a very favorable aspect. It is expected that contrast recovery in a clinical context would be different and presumably lower. Therefore, the contrast recovery coefficients obtained in this study represent an upper limit.
The use of the circular trajectory with a 25 cm radius could also be questioned. In the clinical context, the automatic body contour device is generally activated, and this results in non-circular trajectories with a variable distance between the axis of rotation and the camera heads. For some slim patients and some explorations, this distance would be less than 25 cm, especially when the camera heads are in imaging positions close to the anterior-posterior direction. However, for many other cases (trunk explorations and obese patients), this distance would also be longer for all head positions. The selection of the joint smallest possible radius for the four cameras was found to be an acceptable compromise. Moreover, the circular trajectory with a manually fixed radius renders our experiments very easy to repeat on other already existing (for example the SPECT-CT system from another manufacturer) or future SPECT-CT systems.
Using the manufacturer’s 3D iterative reconstructions, hot and cold contrast recovery improved with the number of iterations (Additional files 3
5, and 6). However, above 24 iterations, the improvements were only marginal, and 30 iterations was chosen as the end point of this study. This was justified by the fact that the noise level steadily increased with the number of iterations (data not shown), while it is always desirable to keep this level as low as possible. The contrast recovery increased with the rod diameter. Whatever the ROI size used to evaluate the contrast, the hot contrast saturated when the rod diameter reached 16 mm. The cold contrast of the General Electric and Siemens cameras saturated for a rod diameter of 20 mm and above, but no saturation could be clearly observed with Brightview. It should be emphasized that the data for the largest hot or cold rod should be taken with some caution. Indeed, this rod is located on the phantom axis, and the phantom was centered in the field of view. Therefore, this rod is more prone to uniformity artifacts than the six other peripheral rods [30–32]. Moreover, the image resolution was demonstrated not to be isotropic, although resolution recovery is included in the reconstruction algorithm . Maximum contrast recovery was slightly system-dependent. With the half ROI, it was in the range 0.85 to 1.1 for the largest hot rods and in the range 0.78 to 0.86 for the largest cold rods. These values were generally lower when scatter was not corrected for and the amount of reduction was system-dependent. However, with the exception of Brightview, the contrast recovery of the two smallest hot rods was found to be higher when scatter correction was not applied. For Infinia and the Symbia T2, the contrast recovery of the smallest (4 mm) hot rod dropped to 0 in scatter-corrected images. This agreed with the observation that the scatter contamination and the performance of the scatter correction varied between the four systems. In the clinical context, the use of scatter correction with a resulting decreased hot contrast for small structures is questionable. For this contrast phantom, the CT protocol (fast or slow, high or low current) used with Brightview had no influence on the results.
Thanks to their resolution recovery, the three reconstruction algorithms delivered images with improved contrast for the small structures. Nevertheless, for accurate quantification, some strategy for partial volume correction remains necessary. The lower contrast recoveries observed for the full ROI as compared to the half ROI show that the partial volume effect remains present. Moreover, although the contrast recovery for the largest hot rods approached unity with the half ROI, they were not all equal to 1, and some differed from 1 by values as large as 0.15 (Figure 2). This indicates that the partial volume correction technique should be tailored to the particular SPECT system and reconstruction algorithm used. Moreover, the reconstruction artifacts should also be considered in the framework of accurate quantification.
Edge and noise artifacts in maximum likelihood reconstructions have been observed and studied for a long time [13, 14, 33]. Noise was said to result from maximum likelihood expectation maximization (MLEM) doing a too good job : ‘MLEM is so successful in producing images that are consistent with the acquired data that the noise is also fully reproduced.’ Edge artifacts seemed to result from the impossibility to recover frequencies whose amplitudes are too low . Therefore, the frequency content of the images is incomplete. This becomes dramatic at edges where representations are made of a very wide frequency range (infinite range for a sharp edge) and result in the observed overshoots . The link between the edge and oscillation artifacts seems not to have been clearly established. However, it was observed that techniques tailored to reduce or suppress the edge artifacts also reduced or suppressed the oscillation artifacts .
Edge and oscillation artifacts were observed with all phantoms, whatever their shape and with all four systems (Figures 4 and 6, and Additional files 8 and 9). Ringing artifacts were already observed by Vija et al. in their early study of Flash3D . The artifact intensities appeared to be system- and phantom-dependent. For the two General Electric cameras, uniformity artifacts were also present, and they could have obscured some other artifacts. It is very interesting to note that the uniformity artifacts were not observed when the images were reconstructed with FBP or 2D OSEM (without resolution compensation). This indicates that the use of reconstruction algorithms with resolution recovery implies a revision of the acquisition parameters, and particularly the total number of acquired counts, of the procedures used to generate the uniformity correction matrix. As an example, Vija et al.  mentioned the use of very-high-count (up to 0.8 billion) floods for uniformity correction of data reconstructed with Flash3D. With Symbia T6, a few SPECT acquisitions of the uniform phantoms were conducted with an elliptical orbit in addition to the circular orbit, and the edge ring artifact was elliptically shaped (Figure 5). The Siemens software allows 2D OSEM reconstructions with resolution recovery only in the transverse plane (no resolution recovery is in this case performed in the axial direction). On these 2D OSEM reconstructed coronal and sagittal slices, the stripes perpendicular to the rotation axis that were clearly visible on the Flash3D reconstructed images were not observed (Figure 7).
In a small structure, the edges come very close to each other, and the edge artifacts collapse. This results in a too-high activity in the central area, and the structure could appear smaller on the nuclear medicine image than on the structural image, as illustrated in Additional file 9. One other issue for iterative reconstruction is the inability to measure the resolution obtained using point or line source in air  and the preferable usage of a contrast phantom to evaluate the performance in distinguishing between objects of different contrasts .
The regularization step included in the reconstruction algorithm should have some control on the overshoot of small structures. As part of the iterative loop, this step could not be deactivated in Astonish or Evolution. However, Flash3D allowed the bypassing of the post-filter. Without this final smoothing, cold contrast recovery was only modestly increased, but the increase was much more important for the hot contrast recovery, and values largely above 1 were observed (Additional file 7). A detailed study with different structures, count statistics, and pixel sizes would probably help to fine-tune the post-filter of Flash3D in order to optimize the compromise between contrast recovery and edge artifacts for various acquisition and reconstruction parameters. Such a study was beyond the scope of this work.
Number of iterations
When ordered subsets are used, the number of subsets has to be considered together with the number of iterations. Generally, one uses the product of both, the so-called number of equivalent number of MLEM iterations (MLEMit). All results demonstrated the need for a sufficiently high number (24 × 8 or 192) of MLEMit to obtain convergence of the iterative algorithm and efficient scatter correction or maximum contrast recovery. This number greatly exceeds the default setting of all three manufacturers, which ranges from 20 MLEMit to 48 MLEMit (Table 2). These settings seem to have been chosen with the main aim of generating images with spatial resolution similar to FBP or OSEM but with a lower noise content and allowing a reduction in scan time and/or patient dose . In the framework of quantification, convergence of the iterative algorithm in all regions of the image is mandatory. We therefore decided to select 24 iterations with eight subsets for our study of the quantification. The small cold nodule in the thyroid phantom highlights the usefulness of a high number of iterations in clinical routine.
Quantification requires the conversion of the recorded counts per pixel into activity per volume unit. This is usually obtained through a calibration step where a source of known activity is scanned. One study has presented the use of a point source and of planar acquisitions to obtain the conversion factor . However, most of the other studies copied the extensively validated PET procedure where a large source of known activity and volume are scanned [4, 21]. This last methodology was adopted in this work, but the influence of the calibration phantom size was also investigated. The reason behind this was twofold. The first point was that using a calibration phantom of a size similar to the test phantom is too fair for the whole procedure and does not correspond to what would be possible with patients. The second point was that large phantoms are not easy to handle. Therefore, any reduction in the calibration phantom size would ease the calibration procedure. This would be particularly desirable if the procedure has to be repeated frequently. The largest calibration phantom used (XL) had sizes comparable to the NEMA and contrast phantoms. The two other calibration phantoms (L and M) had reduced sizes while the cylindrical shape was maintained.
The accuracy of the activity measurement is a very important parameter in this part of our study. As the various departments were not equipped to measure aliquots, the local radionuclide calibrator was used. It is important to note that the activities of the NEMA, contrast phantom, and S phantom were likewise measured. Therefore, the overall reproducibility of the radionuclide calibrator was of much more concern than its accuracy. The daily quality control procedure of the radionuclide calibrator was expected to reduce the error resulting from fluctuations in time to below 3%. Moreover, the same operator always performed all the measurements.
Due to the presence of the artifacts, ROIs of various diameters were drawn on the calibration phantoms. When the ROI diameter equalled the physical diameter of the phantom, the reconstructed activity in contrast, NEMA, and S phantoms was systematically the highest (Figure 3). OSEM is a conservative process in terms of the number of total reconstructed counts. Therefore, the edge overshoot would result in an underestimation of the body part, and the sensitivity would be found lower if the overshoot is not included in the ROI drawn on the calibration phantom. Fluctuations of the reconstructed activity with ROI size were observed for all calibration phantoms (Figure 3). They can easily be related to the oscillation artifacts. The amplitudes of these oscillation intensities increased with the decrease in phantom size, as did the fluctuations of the reconstructed activity (Figure 3).
The reconstructed activities depended clearly on the calibration phantom with differences between the phantoms starting at a low level (but within expected measurement errors), 2% to 3% for Brightview, and increasing to as much as 15% to 20% for Infinia. The calibration phantom resulting in the lowest error differed between the systems and depended on the test phantom considered. The reconstructed activity was higher by 0% to 5% in the contrast phantom than in the NEMA phantom. This difference lies within the experimental errors. Therefore, Teflon seemed not to preclude quantification in the NEMA phantom.
In the S phantom, the reconstructed activity was systematically underestimated, although over- and underestimations were observed for the contrast and NEMA phantoms. The use of a 1% threshold for the drawing of the ROI should have ensured that all counts are included in the ROI . We have no definitive explanation for the underestimation of the S phantom activity.
Considering the results with the contrast and NEMA phantoms, quantification within 10% or even 5% error seems to be feasible, and further refinement of the calibration parameters would eventually improve the accuracy. Previous studies using different systems, isotopes, and phantoms obtained accuracies in the range 0% to 20% ( and references therein, ). With patients, Willowson et al.  obtained an average error of 1%, a per-patient error of less than 5% in 11 out of 12 patients, and an error of 7.4% in the 12th patient. These studies used older cameras that, with the exception of Infinia, are no longer on the market; some used separated stand-alone SPECT and CT systems, and the data were reconstructed with a locally developed software. With Symbia T series and Flash3D, an overall quantification error better than 7% in phantoms and around 1% in patients was reported . Some per-patient errors were as high as 17%, but the per-patient error was below 10% for 13 out of 16 patients. Finally, it is interesting to remember that Hughes et al.  concluded that ‘no significant differences were observed between image resolutions when data acquired from different cameras were reconstructed with an independent algorithm. However, different manufacturers’ reconstruction algorithms produced myocardial wall thickness that differed by up to about 110%.’ In a very recent study , the same authors concluded that there were no differences in the figures of merit parameters when data recorded with different SPECT-CT systems were reconstructed with their own software but that significant differences existed when the manufacturers’ reconstruction software was used.
This study used several imaging systems located in different departments. Under these conditions, it is very difficult to evaluate the experimental error by repeating the measurements several times. The same operators performed all experiments. Nevertheless, the overall reproducibility needs to be assessed in some way. To this end, it was decided to repeat some experiments twice with a short delay, with a longer delay, after changing one parameter, or on a second camera of the same model.
Two successive SPECT acquisitions of the contrast phantom were performed with Symbia T6. Also, the contrast phantom acquisition was repeated with this Symbia T6 twice with a 1-month interval. One of these acquisitions included 256 projections instead of 128, and therefore, the total number of counts was almost double. In all cases, the contrast recoveries differed by less than 10% for rods larger than 10 mm and by less than 20% for most of the smallest rods. With the Siemens T6 system, a shorter acquisition of the NEMA phantom resulted in four times fewer acquired counts, but the values for the residues in the inserts differed from those obtained with the high count acquisition by less than 0.5%. All these repeated experiments indicated that the acquisition parameters, and particularly the number of acquired counts, ensured good short- and long-term reproducibility. Therefore, the results reported in this study are likely to represent effective differences in performance between the four investigated systems.
Imaging the NEMA phantom with a second Brightview camera or a Symbia T2 also led to very reproducible results. Differences in residual fractions were less than 1% between the two Brightview systems and less than 2.5% between Symbia T6 and T2. Also, the attenuation coefficients obtained with the two Brightview or the two Symbia systems were identical in water and differed by less than 3% in Teflon. In the preliminary study (Additional file 2), it was observed that contrast recoveries obtained at a 4-year interval with two cameras of the same model differed by less than 10% for all but one of the hot rods and for the largest cold rods. Therefore, the data issued from the use of a second camera of the same model tended to demonstrate that the results were not particular to the specific camera used for this study.
The CF determination is a crucial step in the quantification procedure. The repeatability and reproducibility of this step were assessed with Symbia T6 and the M phantom. This choice resulted from easy access to this camera, the fact that decay correction was not performed in the Flash3D reconstruction software but in a separate procedure, and the highest intensity of the artifacts for the M phantom. The likelihood of the highest variability was therefore expected when considering the M phantom and Symbia T6. The repeatability was found to be better than 0.5%. The differences between CFs obtained at short interval were around 3.4%. Such small differences are similar to the reproducibility of the radionuclide calibrators. After 10 months, the differences were 5.0% to 6.6% for all ROIs except the 60% ROI, for which the value was as high as 13.6%. However, the limits of this ROI corresponded to a region of a rapid variation in the reconstructed counts resulting from the oscillation artifacts (Figures 4 and 5). This observation stresses again the need for future work devoted to suppression of these artifacts for more accurate quantification in SPECT-CT.