This study is the first systematic review of the characteristics and quality of diagnostic accuracy studies of PET conducted in Japan. Although a total of 138 Japanese studies involving PET were identified, half of them were not indexed in MEDLINE. Although a potential overlap of study subjects may exist among several studies, this could not be taken into consideration due to a lack of information about the study participants in some studies. Also, papers with different aims and methods were considered as independent studies. In contrast, the total number of studies reviewed in a HTA report in the UK was 158 and included 6 non-English studies indexed in international databases [4]. Also, languages in the primary study selection were limited to several European languages in the Belgian report [6,10]. Therefore, non-indexed Japanese studies or studies written in Japanese would likely be missed from international HTA reports.
Malignant neoplasm was the target disease most frequently covered by Japanese studies (Table 1). This is a similar finding to previous international studies [4,10]. Fifty-eight percent of Japanese studies had a sample size less than 50. The estimates of accuracy in small studies are often inexact and their results have little generalizability for target patients [20]. Also, Bachmann et al. estimated that the median number of patients with or without a target condition necessary to calculate valid sensitivity and specificity of diagnostic accuracy is 49 and 76, respectively. The sample size for most international PET studies was also less than 100 [4,6,10]. In addition, approximately 90% of Japanese studies did not include information about funding sources. A systematic review of conflict of interests highlighted that systematic biases support products created by the funder [21], thus implying that hidden conflicts of interest may be present among the Japanese studies.
Our study showed that the mean quality score was 6.7 (e.g., a full score is 14), and 33% of Japanese PET studies were of high quality, as indicated by the quality score of more than 8. These results were similar to those of several recent systematic reviews [22,23]. Also, a high risk of bias was observed in six items including adequate spectrum, adequate reference standard, and absence of verification bias, among others (Figure 2). This result indicates that the Japanese studies have numerous biases and are of relatively low quality, which is limitedly applicable to PET use in clinical settings. For example, PET studies of low quality were excluded from international health technology assessments (i.e., quality score less than eight) [18], or critically examined in clinical recommendations [4,6,10]. Therefore, studies of low quality will neither be used nor reflected in clinical guidelines and health policies. Moreover, the quality of test studies is extremely important as a basis for further evaluation for clinical decision making and health outcomes [24,25]. Greater improvement of the quality of test studies is urgently needed.
Factors related to the methodological quality, target disease, publication year, and study design were determined by multiple logistic regression analysis (Table 2). Prior to 2002 when PET examination was first included in the National Health Insurance, the quality of studies was high. This may be because that prior to the application of insurance coverage, PET researchers, related academies and industries proactively and rigorously conducted and published many PET studies to encourage and persuade the government to include PET testing in the insurance scheme, as well as to promote the utilization of PET testing after its inclusion in the insurance coverage. These efforts to promote PET testing seem to have had a positive influence on maintaining the methodological quality of the studies and overcoming critical assessments from the government. As PET studies have been conducted mainly in the area of oncology, particularly in respiratory cancer, the quality of studies based on target disease will improve over time as research applications of PET expand into other areas. Our results are consistent with previous research which highlighted that prospective studies are favorable for reducing biases [26].
Our assessment of the characteristics and quality of Japanese PET studies demonstrates that efforts to educate researchers, provide incentives, and establish systems for conducting diagnostic studies are needed to encourage investigators to comply with existing methodological standards. Low quality of reporting was found to be a significant obstacle in the evaluation of quality, and therefore the risk of bias remained unclarified in this study. As many studies have limited applicability in clinical practice and health policy, their inclusion might be misleading in some cases. In this study, a high proportion of ‘unclear’ results were observed in several items of risk of bias (Figure 2), which proved difficult for reviewers to evaluate the actual quality of the studies. This result is also reported in several systematic reviews [18,19]. Concerns about the quality of reporting of diagnostic studies led to the endorsement of the STARD statement [27]. Since the publication of this statement, the quality of reporting of diagnostic accuracy studies has slightly improved [28]. To advance the quality of reporting in Japan, efforts are required to raise awareness of the STARD statement and to encourage publishers of Japanese scientific journals to adopt the statement in their instructions to authors.
Conversely, 34.9% of studies not indexed in MEDLINE and 30.4% of the indexed studies were of high quality with a quality score of eight or higher. There was no significant difference in the total quality score between the two groups, even though a significant difference was observed in several items between them (Figure 3). After adjusting for other factors in a logistic regression model, the overall quality was not significantly different between indexed PET studies and those not indexed in MEDLINE (Table 2). This result suggests that non-indexed Japanese studies should be included in systematic reviews as well as both international and Japanese databases in order to prevent the exclusion of high-quality Japanese PET studies. In addition, excluding Japanese studies may introduce a language bias and lead to erroneous conclusions.
The search and collection of non-English language papers is important to minimize language bias [29]. In conducting systematic reviews, international collaboration in the area where language bias might occur could be a practical and feasible solution for minimizing language bias. On the other hand, non-English-speaking researchers should also be encouraged to publish original studies in English in a journal indexed in international databases.
Finally, only 47.8% of Japanese studies employed comparators (i.e., competitive diagnostic technologies such as MRI or CT) to evaluate diagnostic accuracy of PET. Of this percentage, only 23 studies performed statistical analysis. As the diagnostic accuracy of non-comparative studies often differs to that of comparative studies [30], the conclusions of the Japanese studies should be carefully interpreted. However, there has been no mention of this issue even in systematic reviews and HTA reports of PET studies [4,22,23,30]. In addition, only 6.9% of non-comparative studies performed simple comparisons with the results obtained from literature surveys.
These issues might influence the discrepancy of conclusions between Japanese studies and international assessments. For discrepancies found in the coverage of disease areas, Japanese studies could serve as supplementary information for the conclusions or recommendations of international assessments to prevent language bias, since international assessments do not include most Japanese studies. However, further systematic examination would be needed to integrate the information and assess the influence on conclusions and recommendations, since there is no explicit or standardized guideline for integrating these conclusions or recommendations.
On the other hand, in regard to the disease areas where only Japanese studies were available, the application of the Japanese National Health Insurance was based on a small number of studies with relatively low quality scores. In the case of uterine cancer, there is no positive conclusion. In Japan, since there has been neither comprehensive HTA nor guidance based on systematic reviews, further examinations would be required for health and clinical policy.