Introduction

Gastric cancer has a very poor prognosis. The outcomes of patients with gastric cancer are determined by histopathologic factors, such as depth of invasion, nodal status, and distant metastases [1]. Optimal treatment of patients with gastric cancer depends on accurately staging the cancer, and is most commonly accomplished through computed tomography (CT). Recently, endoscopic ultrasound (EUS) has been endorsed for the preoperative staging of gastric cancer by several groups, such as the National Comprehensive Cancer Network, the Brazilian Society of Gastrointestinal Endoscopy, and the Scottish Intercollegiate Guidelines Network [24].

Endoscopic ultrasound

Endoscopic ultrasound (EUS) was introduced into clinical practice in the early 1980s as a way to assess the extent of local tumor infiltration and local lymph node status [57]. The main advantage of EUS is the ability to place the transducer close to the lesion without interference of fat, bowel gas, or bone [6]. EUS allows evaluation of the individual layers of the gastric wall, as well as the identification of enlarged regional lymph nodes and metastasis in the liver; thus, it may be used to stage gastric cancer according to the TNM classification [1, 8]. In particular, EUS is used to determine whether patients with early cancers are appropriate candidates for endoscopic mucosal resection [9]. Furthermore, EUS may also be helpful in planning the appropriate treatment strategy in patients with advanced gastric cancer (AGC), such as determining which patients are suitable for neoadjuvant chemotherapy or a multivisceral resection. To date, EUS imaging can be performed with echoendoscopes or with the use of ultrasound catheters or ‘miniature probes’ which are passed through standard endoscopes. These miniature probes can provide ultra-high-frequency imaging (12–30 MHz), compared to echoendoscopes (5–12 MHz). Higher frequency yields higher resolution of the tumor at the expense of depth of penetration, thus limiting nodal examination [1, 2]; thus, a higher frequency probe may provide better evaluation of a T1/T2 cancer, while a lower frequency probe may be more accurate in predicting nodal involvement.

Unfortunately, EUS also presents some disadvantages. It is one of the most demanding endoscopic procedures and thus is highly operator-dependent. Extensive training and experience in the use of the echoendoscope are required to obtain complete and accurate images [10]. It cannot be performed adequately when the endoscope cannot be well positioned because of the tumor location, or when the full extent of the tumor cannot be visualized because of high-grade strictures [6]. Although EUS is well suited for the evaluation of local invasion, it is of limited usefulness in the overall assessment of more distant spread [6]. Furthermore, EUS is an invasive technique requiring sedation and has recognized procedure- and sedation-related complications, including mortality [6]. Lastly, EUS adds incremental costs, and therefore should be used only if it contributes significantly to improved patient management and outcomes [11].

Several studies have compared the preoperative endosonographic assessment of T and N stage with histopathological staging of the resected specimen. However, the results from these studies vary considerably. Therefore, the goals of this meta-analysis were to: (1) comprehensively identify, synthesize, and evaluate findings from articles on the accuracy of EUS in the preoperative staging of gastric cancer; (2) determine EUS accuracy for different T stages (T1, T2, T3, and T4); and (3) verify EUS sensitivity and specificity for N staging.

Methods

Data sources

Electronic literature searches were conducted using Medline and Embase from 1 January 1998 to 1 December 2009 according to the search algorithm presented in Appendix A. Search terms included: [exp Stomach Cancer/or (((gastric or stomach) adj1 cancer$) or ((gastric or stomach) adj1 carcinoma) or ((gastric or stomach) adj1 adenocarcinoma) or ((gastric or stomach) adj1 neoplasm$)).mp.] and [exp gastrointestinal endoscopy/or esophagogastroduodenoscopy/or endoscopy/or digestive tract endoscopy/or ESOPHAGOSCOPY/or cancer staging/or exp endoscopic therapy/or exp endoscopic surgery/or endoscopic mucosal resection/or endoscopic echography/or “endoscopic ultrasound”.mp. or endoscopic echography/] and [human and English language] and [clinical trial/or controlled clinical trial/or exp comparative study/or meta-analysis/or multicenter study/or exp practice guideline/or randomized controlled trial/] not [*gastrointestinal stromal tumor/] or [exp B cell lymphoma/and “marginal zone”.mp.] not [case report/or review]. A separate search of the Cochrane Central Register of Controlled Trials (1998–2009) was performed using the search term “gastric cancer”. No attempt was made to locate unpublished material or contact researchers for unpublished studies.

Study selection and review process

To be eligible, studies had to meet the following criteria: (1) the diagnostic/staging accuracy of EUS in patients with histologically proven gastric cancer was investigated, (2) studies involving only patients submitted to a gastrectomy, (3) no age or gender restrictions, (4) publication in a peer-reviewed journal from 1 January 1998 to 1 December 2009, and (5) publication in English. We excluded (1) reviews, meta-analyses, systematic reviews, abstracts, editorials or letters, case reports, and guidelines; (2) studies involving fewer than 30 patients; (3) studies evaluating mixed cancers with combined data analysis; (4) studies that did not provide sufficient information to determine at least one of the preoperative staging performance measures (accuracy, sensitivity, or specificity); (5) animal and ex vivo studies; (6) studies in which patients were presurgically treated with radiotherapy or chemotherapy; and (7) studies that did not use the TNM classification system.

All electronic search titles, selected abstracts, and full-text articles were independently assessed by a minimum of two reviewers (NC, JT, or RC). Reference lists from review papers and relevant articles were also examined for additional studies that met our inclusion criteria. Disagreements on study inclusion/exclusion were resolved with a consensus meeting.

Data extraction

A systematic approach to data extraction was used to produce a descriptive summary of participants, interventions, and study findings (Table 1). The first reviewer (RC) independently extracted the data and a second reviewer (NC or JT) reviewed the data extraction. Only data on patients who underwent a preoperative EUS assessment and subsequent surgery with pathologic examination were extracted. In this review, if a selected article presented or compared EUS performance with the performance of another procedure on gastric cancer staging (e.g., CT, magnetic resonance imaging [MRI]) only the results related to EUS performance were considered for analysis. No attempt was made to contact authors for additional information.

Table 1 Characteristics of included studies

Quality of studies

A number of criteria and tools to assess quality of studies have been developed [1214]. However, there is a lack of consensus on how to best assess the quality of non-randomized clinical trials [1, 15]. Consequently, for this meta-analysis, studies were selected based on completeness of data and inclusion criteria only [1].

Data analysis

Descriptive characteristics were collected for each included study. A wide range of definitions was found for the calculation of accuracy, sensitivity, and specificity. Therefore, the following performance characteristics were re-calculated from the original numbers provided in each included publication: accuracy, agreement (Kappa statistic), sensitivity, and specificity. Accuracy was defined as the proportion of tumors where staging using EUS agreed with the postoperative staging using histopathology. We constructed 4 × 4 tables for T stage (corresponding to T1, T2, T3, and T4) or 5 × 5 tables when the preoperative imaging technique did not detect the presence of a tumor (T0). Similarly, we created 2 × 2 tables for preoperative N staging (corresponding to N0 and N+). Using these tables, we calculated agreement between EUS technique and pathology for T and N assessment using the Kappa statistic [16]. Also, using the tables for preoperative N staging, we calculated the sensitivity and specificity of lymph node staging.

The meta-analyses were calculated using the inverse variance method; 95% confidence intervals (CIs) were calculated for the pooled estimates of the accuracy, Kappa statistic, sensitivity, and specificity. Non-overlapping 95% CIs were used to determine a significant difference between groups [16]. The following interpretation of Kappa was used: <0 = less than chance agreement, 0.01–0.20 = slight agreement, 0.21–0.40 = fair agreement, 0.41–0.60 = moderate agreement, 0.61–0.80 = substantial agreement, 0.81–0.99 = almost perfect agreement [16].

I 2 and Cochran’s Q tests were performed to assess the heterogeneity between studies (for the Cochran Q test, heterogeneity was present if P < 0.05, while values of I 2 to 25, 50, and 75% represented low, moderate, and high heterogeneity, respectively). As significant heterogeneity was identified, the EUS annual volume was investigated as a potential cause. We calculated annual volume by dividing the total number of cases by the number of reported years of study. Studies were grouped by annual volume and according to accuracy; 2 × 2 tables were constructed. For annual volume, we stratified centers into those that performed more than, and those that performed less than, 30 EUS procedures per year. For the pooled accuracy of T and N staging, we divided centers into those with EUS accuracy higher than 70% and those with accuracy lower than 70%. We also aimed to explore the transducer frequency as a source of heterogeneity by investigating the diagnostic accuracy in different stages according to the type of transducer frequency. We attempted to divide studies into those that used a higher-frequency transducer (>12 MHz) and those that used a lower-frequency transducer (≤12 MHz) to compare the EUS accuracy (EUS accuracy higher than 70% and lower than 70%) for all T staging. Six studies used a combination of both low- and high-frequency transducers (as shown in Table 1). Unfortunately, these studies did not clearly report when the low- or high-frequency transducers were used; as a result they were excluded. No studies exclusively used high-frequency transducers. Consequently, it was possible to identify only one group of studies (≤12 MHz). Therefore, it was not feasible to create comparison groups.

Statistical analyses were performed using the R version 2.10.1 statistical package (The R Foundation for Statistical Computing, ISBN 3-900051-07-0, http://cran.r-project.org/).

Results

A total of 7117 titles were identified from the electronic and hand searches for preliminary review. After removal of duplicates and screening for relevant titles and abstracts, 122 articles were submitted for a full review. A total of 22 were included [7, 1737] (Fig. 1); the characteristics are presented in Table 1. A total of 2445 patients were staged preoperatively by EUS; the majority were from studies from Asia (1892 patients), followed by Europe (337 patients), and North America (216 patients). The majority of participants presented with T3 disease (n = 873), followed by T2 disease (n = 734), T1 disease (n = 584), and T4 disease (n = 254).

Fig. 1
figure 1

Article selection flow

T stage

The diagnostic accuracy of EUS for overall T staging varied between 56.9 and 87.7% and the pooled accuracy was 75% (95% CI: 71–80%) with a moderate pooled Kappa (0.52; 95% CI: 0.38–0.67). For T1, individual study accuracy ranged from 14 to 100% and the pooled accuracy was 77% (95% CI: 70–84%) (Fig. 2). T2 staging accuracy ranged from 24 to 90% and the pooled accuracy was 65% (95% CI: 57–73%) (Fig. 3). Accuracy ranged from 50 to 100% for T3 staging and the pooled accuracy was 85% (95% CI: 82–88%) (Fig. 4). EUS accuracy for T4 staging ranged from 25 to 100% and the pooled accuracy was 79% (95% CI: 68–90%) (Fig. 5). The 95% CIs for the pooled accuracies overlap in forest plots for all T stages, indicating that they are not statistically different from each other. The calculated I 2 value for all pooled accuracy estimates was 89.5% (95% CI: 85–92%). The Cochran Q test confirmed that the included studies were heterogeneous (P < 0.0001).

Fig. 2
figure 2

EUS accuracy for T1 staging. N Number of patients, Acc accuracy, SE standard error, CI confidence interval, CEUS conventional EUS, NCEUS new conventional EUS, UP ultrasound probe, 3D three-dimensional

Fig. 3
figure 3

EUS accuracy for T2 staging

Fig. 4
figure 4

EUS accuracy for T3 staging

Fig. 5
figure 5

EUS accuracy for T4 staging

N stage

EUS diagnostic accuracy for N staging ranged from 30 to 90%; sensitivity ranged from 16.6 to 96.8%; and specificity from 57.1 to 100%. The pooled accuracy for N staging was 64% (95% CI: 43–84%); the pooled sensitivity was 74% (95% CI: 66–81%); and the pooled specificity was 80% (95% CI: 74–87%) (Figs. 6, 7). The calculated I 2 values for pooled sensitivity and specificity were I 2 = 89.9% (85.8%; 92.9%) and I 2 = 85.6% (78.8%; 90.2%), respectively. The Cochran Q test revealed that the studies included were heterogeneous (P < 0.0001).

Fig. 6
figure 6

EUS sensitivity for N staging. Sens sensitivity

Fig. 7
figure 7

EUS specificity for N staging. Spec specificity

Effect of annual volume

Subgroup analyses did not demonstrate an association between EUS performance in T and N staging and EUS annual volume (P = 0.836, 0.99, respectively).

EUS examination

Combinations of different transducer frequencies were used in the majority of the studies. Fifteen studies [7, 17, 19, 20, 2224, 26, 2932, 3436] used combinations of frequencies of ≤12 MHz and six studies [18, 25, 27, 28, 33, 37] used combinations of frequencies ranging from 5 to 20 MHz. It was not feasible to construct a 2 × 2 table to investigate transducer frequencies as source of heterogeneity. However, based on the data from 13 studies (≤12 MHz) it was possible to confirm that EUS staging accuracy varied vastly in the studies using low-frequency transducers. The accuracy of T1 staging varied from 40 to 100%, T2 staging from 0 to 90%, T3 staging from 54 to 100%, and T4 staging from 0 to 100%.

Discussion

Accurate staging influences management decisions and predicts prognosis for gastric cancer patients. It is utilized to select patients for endoscopic or laparoscopic treatment, for the selection of those who may benefit from less invasive diagnostic procedures [38], and for the selection of those who may benefit from multimodal treatment [39, 40]. However, it is operator-dependent, adds incremental costs, and has a risk of complications, including mortality. In the present meta-analysis of 22 studies, the pooled accuracy of EUS for tumor invasion (T stage) was moderate; however, it tended to be higher for advanced disease when compared to early disease. EUS tended to perform slightly worse for nodal staging, with moderate accuracy, sensitivity, and specificity. There was significant variability across studies resulting in statistical heterogeneity which was not explained by the annual volume of EUS procedures performed at an institution.

There are few other published systematic reviews and meta-analyses assessing EUS performance for staging gastric cancer. An early systematic review by Kelly et al. [5] evaluated 27 articles, published between 1981 and 1996, of which 13 evaluated gastric cancer, and found that EUS performed better for staging gastric carcinoma compared to carcinomas of the esophagus. As with the study by Kelly et al., we found that EUS performed better when staging tumor invasion than when staging nodal status. In addition, we found that EUS tended to be more accurate for the diagnosis of more advanced T stages (T3 and T4 disease). Our findings are consistent with a meta-analysis by Puli et al. [1], which evaluated 22 studies (1986–2006), and also described better EUS accuracy in higher T-stage disease.

In some of the other meta-analyses [6, 41], EUS staging performance for T and N stage was compared with that of other imaging modalities such as abdominal ultrasound (AUS), conventional MRI, and CT. While these reviews suggest that no modality consistently achieves both high sensitivity and high specificity in staging gastric cancer, our study did not compare EUS to other imaging modalities. Our group has performed a separate meta-analysis of radiologic imaging in the preoperative management of gastric cancer, finding an overall accuracy for T stage of 68, 72, and 83% and an overall accuracy for N stage of 68, 66, and 53% for AUS, CT, and MRI, respectively [42]. All meta-analyses of EUS in gastric cancer published to date have identified significant heterogeneity in the included studies. In our study and that reported by Kwee and Kwee [43] subgroup analyses were performed to try to identify the sources of heterogeneity. Kwee and Kwee [43] included 18 studies from 1988 to 2007 in their study and found that heterogeneity was eliminated if studies were restricted to those evaluating patients with early gastric cancer and those that used transducers with frequencies less than 15 MHz. Our review showed that EUS performance for T staging varied between studies using low-frequency transducers, but a comparison of accuracy for the high-frequency probes versus the low-frequency probes was not possible, as no studies exclusively used high-frequency probes. Kwee and Kwee [43] also examined the total number of patients in each study and the country of origin of the study, both of which factors might be reflective of operator experience, but they found that neither of these factors explained the heterogeneity. Similarly, we thought that operator experience, as measured by annual EUS volume, might explain the heterogeneity. However, we found no association between annual EUS volume and accuracy. Therefore, this factor cannot explain the heterogeneity.

There were a few limitations to our meta-analysis. The majority of included patients were staged preoperatively by EUS in Asia. Consequently, the reported results may not be generalizable to other, lower-volume regions. Also, the way in which individual studies reported their results affected their inclusion in the meta-analysis. For example, some studies reported results for T staging as T1/T2 and T3/T4, which precluded data extraction for T stage, although data on N staging (N0 vs. N+) could be extracted and analyzed. Lastly, our meta-analysis, like the others that have been previously published, demonstrated significant heterogeneity, with no clear explanation for this. As a result, caution must be used in interpreting the findings.

Conclusion

Our review found EUS to have only moderate agreement and accuracy for both T and N staging. EUS may be most useful for staging cancers with greater tumor involvement (T3 and T4). The significant heterogeneity of the included studies should be taken into consideration when interpreting our findings. The decision to use EUS, which has only moderate accuracy in the staging of gastric cancer, must be balanced against the predicted change in management, as other less invasive staging methods exist.