CERMEP-IDB-MRXFDG: a database of 37 normal adult human brain [18F]FDG PET, T1 and FLAIR MRI, and CT images available for research

We present a database of cerebral PET FDG and anatomical MRI for 37 normal adult human subjects (CERMEP-IDB-MRXFDG). Thirty-nine participants underwent static [18F]FDG PET/CT and MRI, resulting in [18F]FDG PET, T1 MPRAGE MRI, FLAIR MRI, and CT images. Two participants were excluded after visual quality control. We describe the acquisition parameters, the image processing pipeline and provide participants’ individual demographics (mean age 38 ± 11.5 years, range 23–65, 20 women). Volumetric analysis of the 37 T1 MRIs showed results in line with the literature. A leave-one-out assessment of the 37 FDG images using Statistical Parametric Mapping (SPM) yielded a low number of false positives after exclusion of artefacts. The database is stored in three different formats, following the BIDS common specification: (1) DICOM (data not processed), (2) NIFTI (multimodal images coregistered to PET subject space), (3) NIFTI normalized (images normalized to MNI space). Bona fide researchers can request access to the database via a short form. Supplementary Information The online version contains supplementary material available at 10.1186/s13550-021-00830-6.


Introduction
Imaging databases are very useful to re-analyse data in a different context, to increase the number of subjects of a study, and to develop new methods. Imaging databases play a crucial role in numerous analysis methods that rely in the comparison between the data of a group or of an individual and a group of reference. This includes studies using a normative database for analysis and quantification purposes (such as partial volume correction), machine learning approaches, multi-atlas techniques, and validation of image processing pipelines. Databases with different modalities per participant also allow approaches that derive "missing" modalities, e.g. creating pseudo-CTs for attenuation correction in PET-MR [1][2][3][4].

Open Access
*Correspondence: merida@cermep.fr † Inés Mérida, Julien Jung, Alexander Hammers and Nicolas Costes have contributed equally to this work 7 CHU de Lyon HCL -GH Est, 59 Boulevard Pinel., 69677 Bron Cedex, France Full list of author information is available at the end of the article Acquisition of imaging data, such as MRI scanning and in particular PET imaging that requires the injection of a radiotracer, represents an important logistical and monetary cost. In addition, participants have to consent to data acquisition and dissemination, and many countries have restrictions on using ionising radiation in healthy controls, adding to difficulties in acquiring such databases. Database sharing thus contributes to reduce research costs and reduces radiation exposure of healthy controls.
In order to make database sharing more efficient, the scientific community has implemented a database standardisation to organize and describe the data (Brain Imaging Data Structure (BIDS), https:// bids. neuro imagi ng. io, [13]) and more specifically for PET modality (https:// bids-speci ficat ion. readt hedocs. io/ en/ bep-009/ 04-modal ity-speci fic-files/ 09-posit ron-emiss ion-tomog raphy. html, [14]). In this work we introduce a multi-modal database of 37 healthy subjects constructed with MRI, CT and [ 18 F]FDG PET images to BIDS standard. We have obtained ethical permission to share the data on request.

Recruitment and cohort characteristics
All enrolled subjects provided written informed consent to participate in the study (EudraCT: 2014-000610-56). The subjects were informed that their anonymized images could be used for methodological development and had been given the option to oppose this use of their data. The inclusion criteria were adult healthy subject and aged between 20 and 65 years. Exclusion criteria were (1) children and adults older than 65 years, (2) woman of childbearing potential without effective contraception, (3) history of neurological disorders, (4) any contraindication for MRI scanning, (5) active infectious disease. Thirty-nine subjects were included in the study. Each subject had a T1-weighted MRI, a T2 fluidattenuated inversion recovery (FLAIR) MRI and an [ 18 F] FDG PET/CT brain scan. For all participants, the PET/ CT scan and the MRI session took place on the same day (between 8 a.m. and 14 p.m.). The subjects' MR and PET images were visually reviewed by two neurologists for conspicuous brain abnormalities. Two subjects showing brain lesions on the MR images (one probable insular cavernoma, one cerebellar lesion with hyperintense signal in the FLAIR sequence suggesting possible inflammatory disease of the central nervous system) were excluded from the database.

MRI acquisition and reconstruction
MRI sequences were obtained on a Siemens Sonata 1.5 T scanner. Three-dimensional anatomical T1-weighted sequences (MPRAGE) were acquired in sagittal orientation (TR 2400 ms, TE

PET and CT acquisition and reconstruction
PET and CT data were acquired on a Siemens Biograph mCT64. During the uptake period, participants were instructed to rest with their eyes closed and without auditory stimulation. A static PET data acquisition started 50 min after the injection of 122.30 ± 21.29 MBq of [ 18 F]FDG (individual doses are provided in the demographics table) and lasted 10 min [16]. PET images were reconstructed using 3D ordinary Poisson-ordered subsets expectation maximization (OP-OSEM 3D), incorporating the system point spread function and time of flight, and using 12 iterations and 21 subsets (Siemens' "HD reconstruction"). Data correction (normalization, attenuation and scatter correction) was fully integrated within the reconstruction process. Gaussian post-reconstruction 3D filtering (FWHM = 4 mm isotropic) was applied to all PET images [17]. Reconstructions were performed with a zoom of 2 yielding a voxel size of 2.04 × 2.04 × 2.03 mm 3 in a matrix of 200 × 200 × 109 voxels (axial field of view 221.27 mm). Low-dose CT images for attenuation correction were acquired with a tube voltage of 100 keV and reconstructed in a 512 × 512 × 233 matrix with a voxel size of 0.6 × 0.6 × 1.5 mm 3 (axial field of view 349.5 mm).

Processing pipeline Data anonymisation and pre-processing
Data anonymisation was performed on the DICOM files using the gdcmanon function (http:// gdcm. sourc eforge. net/ html/ gdcma non. html). DICOM files were converted to NIFTI format with dcm2niix software (https:// github. com/ rorde nlab/ dcm2n ii). The background of CT images was cleaned in order to remove the scanner table and other objects such as the pillow included in the background of the image. For this, a binary mask of the head of the subject was automatically generated following a procedure described in [18] using tools from the FSL (Version 6.0, https:// fsl. fmrib. ox. ac. uk/ fsl/ fslwi ki/) and NiftySeg (http:// cmict ig. cs. ucl. ac. uk/ wiki/ index. php/ Nifty Seg) suites. Finally, the binary mask was applied to the CT image.

Coregistration
As first step, the origin of each NIFTI image was set to the matrix centre. Then, CT, T1 MRI and FLAIR MRI images were coregistered to the [ 18 F]FDG PET image using the Coregister & Estimate function from the SPM 12 toolbox (https:// www. fil. ion. ucl. ac. uk/ spm/ softw are/ spm12/).

Spatial normalisation
All images were normalized to MNI space through the tissue classification into grey and white matter probability maps of the T1 image. For that, individual subject's deformation fields were calculated by the Segment function of SPM 12 [19] from the T1 images previously coregistered to the PET image (but not resliced to preserve native resolution). Transformations for MR to PET space coregistration and PET to MNI space normalisation were concatenated and applied at once to avoid an intermediate resampling of the MRI data. All normalized images were resampled at 1 × 1x1 mm using 4th degree B-spline interpolation.

Intensity normalisation
Reconstructed PET images were normalized by the subjects' weight and injected dose to obtain Standard Uptake Value (SUV) images (radioactivity concentration [kBq/ cm 3 ] / (dose [kBq] / weight [kg])). In addition, reconstructed PET images were normalized by each subject's mean activity within the intracranial volume (ICV) mask provided by SPM12 to obtain Standard Uptake Value ratio (SUVr).

Regional analysis
The T1 MR images were anatomically segmented into 83 regions using the Hammers_mith maximum probability atlas n30r83, which is based on the mutli-atlas fusion of 30 manually delineated MRIs of healthy young adults [5,6], available at http:// brain-devel opment. org. The atlas was wrapped to each individual MRI space via the inverse transformation of the deformation fields from subject's space to the MNI space computed at the spatial normalisation step. Grey matter and white matter probability maps obtained with the Segment function were thresholded at 0.5 and combined with the 83-ROI anatomical segmentation in order to separate their grey and white matter parts, expect for pure white matter regions like the corpus callosum, and pure grey matter regions like the basal ganglia. Mean regional SUV and SUVr were extracted in a selection of grey matter anatomical regions of the Hammers_mith segmentation.

Leave-one-out SPM analysis on [ 18 F]FDG images
Leave-one-out ANCOVA was performed on SPM12 in order to compare each subject (healthy control) of the database to the others. For the statistical analysis, PET images were smoothed with a Gaussian filter at 8 mm FWHM. This further smoothing is always used in voxelbased analysis to accommodate interindividual anatomical variability and improve the sensitivity of the statistical analysis [20]. We used age and the global mean calculated within the intracranial volume mask as covariates. Two different contrasts were explored: Hyper-metabolism, i.e. activity of one subject > activity of the remaining subjects in the database, and hypo-metabolism, i.e. activity of one subject < activity of the remaining subjects in the database. Significant differences where defined at p < 0.05 FWE at the cluster level.
The database outliers were assessed with three criteria, for both hypometabolism and hypermetabolism.
• Subject-level: number of subjects with significant differences / total number of subjects in the database × 100 • Cluster-level: total number of significant clusters across all subjects / average number of resolution elements (resells) in the mask × 100 • Voxel-level: total number of voxels among the significant clusters across all subjects / number of voxels in the SPM mask × 100 The database is available in three different formats, following the BIDS common specification:

Database IDB-MRXFDG
• DICOM (data not processed) • NIFTI (multimodal images coregistered to PET subject space) • NIFTI normalized (images normalized to MNI space) Table 2 lists the regional volumes obtained via the Hammers_mith maximum probability atlas. Coefficients of variation were as expected, without obvious outliers. The structure sizes were also in line with expectations [5,6]. Figures 3 and 4 show boxplots of mean regional SUV and SUVr respectively, extracted in a selection of grey matter anatomical regions, for all subjects in the database. Each region is composed of left and right sub-regions. Mean regional SUV values were 5.36 ± 1.32, range 1.35-8.54 (Fig. 3). Three subjects in the database had lower SUV values (between 1.35 and 3). The distribution of SUVr values (Fig. 4) remains very similar to the distribution of SUV values (1.49 mean ± 0.26 SD, range 0.85-2.22), except that the dispersion is reduced and the outlier values from the three participants with unusually low SUVs are regularized. Normalizing with the ICV mean value thus acts as an efficient way for regularizing the SUV distribution leaving the inter-regional variability intact.

Leave-one-out SPM analysis
Results for the leave-one-out analysis of [ 18 F]FDG PET are reported in Table 3. At the subject-level, 5/37 (13.5%) of the participants had any significant increases in [ 18 F] FDG uptake (hypermetabolism) relative to the other 36 participants. Any significant decreases (hypometabolism) was found for 11/37 (29.7%) of the participants.
At the cluster-level, significant changes were found in at most 5.21% of resolution elements, and at the voxellevel, in at most 0.32% of voxels. All abnormalities in controls compared with controls are by definition false positives. We examined all 16 and present our findings in the Additional file 1 (Table S1 and Table S2). Virtually all false positives had an anatomical or artefactual explanation.

Discussion
A new database of 37 healthy subjects including T1 and FLAIR MRI, CT, and [ 18 F]FDG PET images, called IDB-MRXFDG, has been created.
The age range has been selected to reflect the ages of participants in cognitive and clinical research studies at the CERMEP imaging centre, encompassing amongst others epilepsy, movement disorders, multiple sclerosis and disorders of consciousness and will align with the research priorities of many similar centres.
We performed quality control of all images visually and by screening for volumetric and regional SUV abnormalities. Three subjects had unusually low SUVs; this may be due to imperfect observation of the need for fasting ahead of the scan. This could have been ascertained by measuring the blood glucose level which was not measured here, which is a limitation of the study. We show that a simple global normalisation procedure removes the resulting outliers (Fig. 4); depending on the application more sophisticated intra-scan normalisation procedures are conceivable [21,22]. We also performed SPM leaveone-out studies for [ 18 F]FDG. The relatively high falsepositive rates per subject are explained by the existence of significant clusters of small size (from 1 to 95 voxels).    Areas of apparent hypermetabolism were either at the edge of the brain or at the bottom of a particularly deep sulcus (see Additional file 1: Table S1); areas of apparent hypometabolism (Additional file 1: Table S2) were clearly linked to the participant's anatomy, typically to a wide sulcus or fissure (7/11 cases). The other 4 cases were extracerebral or at the edge of the brain, probably linked to imperfect normalisation. We believe none would have been considered abnormal had they been seen in an analysis comparing one research subject with a particular condition against a group of controls. When testing the normality of the database at the cluster and voxel-level, the expected threshold of 5% of abnormality or lower was found for both hyper-and hypo-metabolism. The database therefore appears suitable for voxel-based [ 18 F]FDG PET analysis with a ≤ 5% risk of Type 1 error. The IDB-MRXFDG database could be used in many different applications such as the statistical comparison of a patient (or group of patients) to a database of healthy subjects, automatic quantitative analyses, and more generally methodology development in neuroimaging.
The inclusion of [ 18 F]FDG PET in IDB-MRXFDG is particularly important. While there are now many MR databases covering, with varying density, the human lifespan as reviewed in [7], we are aware of very few [ 18 F]FDG PET databases. Wei et al. [10]) scanned 78 healthy subjects aged 3-78 years on a PET/CT scanner; it is not clear whether this database is available on request, and there is no mention of MRI. The Marseille database (used e.g. in [12]) contains data from 60 healthy adults aged 21-78; [ 18 F]FDG PET, T1 weighted MRI, and CT data are available by arrangement. A rare paediatric database [11] contains 24 datasets of participants aged 4.5-17.9 years (mean ± SD 10.06 ± 3.1 years) and may be shared on request. These are "pseudo-controls" derived from epilepsy patients, selected from among a total of 71 children as the subgroup with both a normal visual analysis and a normal SPM analysis derived iteratively. They have been scanned on a traditional PET scanner with transmission-based attenuation correction which makes comparison with PET/ CT data difficult [23]; no MRI is available. A large  [24]. It should be noted that we used very high-quality reconstructions incorporating both the system point spread function and time-of-flight information, which will not be available on all machines. If lower resolution images are required, the images could simply be filtered with a gaussian kernel (e.g. [25]).
Examples of database uses for work in MR include the voxel-wise comparison of a patient with a control group to detect abnormalities from T1 images via voxel-based morphometry [26,27] and its variants that use T1 derivatives like grey-white matter junction images [28,29] for the detection of specific pathologies like Focal Cortical Dysplasia. FLAIR as a sequence highly sensitive to pathology has similarly been used at the single-subject level in comparison to control groups (e.g. [30,31]). Another group of examples is the region-wise comparison of the size of cerebral structures between groups or between individuals and a control group (e.g. [32][33][34][35]). Importantly, such work has been successfully undertaken with control groups scanned on a different scanner (e.g. [36,37]), and IDB-MRXFDG could be used to increase the size of control groups.
The multimodality aspect of IDB-MRXFDG is particularly important.
Since PET-CT scanners rapidly displaced PET-only scanners in the early 2000s, low-dose CT has been coupled to brain [ 18   of commercial PET-MR scanners since 2011, there has been no direct way of measuring electron density in the head, and alternative approaches have had to be found. Synthesis of "pseudo-CTs" via atlas approaches [1,2] is a successful approach that performs well overall [38] but requires pairs of MR and CT images to achieve the synthesis. IDB-MRXFDG has already been used for such approaches [39]. The latter application of databases-MR-based attenuation based on MR-CT pairs-is one domain where Deep Learning methods, notably with Convolutional Neuronal Networks, have recently become very successful [3,40]. However, they often require substantially larger training datasets or priors than multi-atlas methods, in the case of MR-based attenuation recently estimated at 100-400 pairs, with an influence of MR heterogeneity [40]. More widespread availability of databases will further Deep Learning approaches, particularly when multiple modalities are available per subject, allowing e.g. synthesis of missing modalities [41].
Pairs of data are also required for partial volume effect correction methods incorporating structural MRI information (PET-MR pairs; e.g. [42]). The additional availability of FLAIR-T1 pairs can be exploited e.g. for detection of focal cortical dysplasias as the underlying substrate of medically refractory focal epilepsies [30].