Introduction

Of all patients presenting to the Emergency Department (ED), approximately 10% have complaints of acute abdominal pain. Acute abdominal pain can be caused by a wide variety of conditions. Formerly these patients were thought to have a acute abdomen, and surgery was indicated. Nowadays, patients with acute abdominal pain, even if accompanied by abdominal tenderness and rigidity, not all of them will undergo surgery, while others without abdominal rigidity are operated on [1]. Diagnostic imaging is widely used in the work-up of patients with acute abdominal pain. Ultrasound and computed tomography (CT) are both frequently used on top of clinical and laboratory evaluation. The American College of Radiology suggests an abdomen/pelvis CT with contrast medium in patients with acute abdominal pain [2]. Others are in favour of ultrasound as the primary imaging technique mainly because ultrasound is easily accessible and does not expose patients to ionising radiation [3, 4]. Ionising radiation exposure at CT is associated with the risk of radiation-induced cancer. This is a drawback of CT, especially as CT is increasingly being used in the diagnostic work-up of young patients. This may prompt the evaluation of alternative imaging strategies next to CT, such as ultrasound and MRI [5]. However, diagnoses should not be missed or delayed and thus the most accurate imaging technique should be used.

A previous evaluation of diagnostic strategies for unselected patients with acute abdominal pain favoured a conditional CT strategy for the detection of urgent conditions, with ultrasound first and CT after a negative or inconclusive ultrasound [6]. For common diagnoses causing acute abdominal pain, such as appendicitis literature suggests CT in the diagnostic work-up of these patients suspected with appendicitis [7]. Primarily usage of CT in patients suspected with diverticulitis is not supported by literature, as accuracy of US and CT were comparable in a recent published meta-analysis [8]. The fact that ultrasound is observer-dependent is thought to be a major disadvantage. Its accuracy, as reported in the literature, may be overestimated because in a research environment ultrasound is usually performed by highly experienced observers. Ultrasound accuracy could also be lower in specific patient subgroups, such as in obese patients, women, and in specific age groups, especially women of reproductive age. CT, on the other hand has good inter-observer agreement in general, and even excellent inter-observer agreement for frequent diagnoses causing acute abdominal pain (e.g. appendicitis and diverticulitis) [9].

Ultrasound will only be an acceptable alternative for CT if its diagnostic accuracy is comparable, i.e. if it can be reliably used for the detection of frequent causes of abdominal pain in unselected patients presenting at the ED. In this paper we report a head-to-head comparison of the accuracy of ultrasound and CT in detecting common causes of acute abdominal pain, such as appendicitis and diverticulitis, in patients presenting at the ED with acute abdominal pain. We also evaluated to what extent the accuracy of ultrasound was affected by patient characteristics and observer experience.

Materials and methods

Patients

Details of the study protocol have been published elsewhere [6, 10]. We identified consecutive patients presenting with acute abdominal pain for more than 2 h and less than 5 days at the emergency department (ED) of two university and four (large) teaching hospitals. Patients discharged from the ED by the treating physician without any diagnostic imaging (ultrasound, CT or plain radiographs), patients under 18 years, pregnant women, patients with a blunt or penetrating trauma, patients with distinctive flank pain, suspected with renal colic,as well as patients in haemorrhagic shock caused by a gastrointestinal bleeding or acute abdominal aneurysm were not invited. Two of the teaching hospitals included patients from Monday to Friday between 9 am and 5 pm. In all other hospitals, patients were included 7 days a week from 8 am until 11 pm.

Eligible patients were invited to the study after being informed orally about the study by the treating physician. An information brochure was provided to them. Consenting patients were included in the study. This study had been approved by the Institutional Review Boards of participating hospitals before its initiation.

All included patients were clinically evaluated at the ED by the treating physician, usually a surgical or emergency medicine resident, after which the patients underwent a full diagnostic protocol. The treating physician prospectively recorded patients’ characteristics and the findings of clinical history and examination in a case record form.

Observers

After clinical assessment at the ED, all consenting patients underwent ultrasound and computed tomography (CT) within a few hours of presentation to the ED. Ultrasound and CT were independently evaluated by two different blinded observers. Between 5 pm and 11 pm, when often only one attending radiologist or radiological resident was present, both ultrasound and CT were evaluated by the same observer. The ultrasound examination was performed and evaluated by the observers: the attending radiologist or radiological resident, not by a sonographer. To guarantee a blinded evaluation for study purposes, ultrasound was performed first and documented in the case record form. CT was only evaluated after finalising the ultrasound part of the case record form.

The CT findings with immediate treatment consequences were communicated to the treating physician. In cases presenting after hours, CT examinations were re-evaluated by an abdominal radiologist the next morning and these findings were documented in the case record form. This radiologist was blinded to the ultrasound evaluation and had access to the same details on clinical findings as the person evaluating the ultrasound examination. This second reading was used for this comparative study, so all CT examinations were read or supervised by a radiologist. Contrary to ultrasound examinations, which were performed by radiological residents alone after hours. To evaluate the effects of experience, all observers were asked to record the number of abdominal ultrasounds they had performed (<100, 100–500, 500–1,000, 1.000–5.000, 5.000–10.000 or >10.000 examinations).

Ultrasound

To standardise the ultrasound examination, a general survey of the abdomen was performed and findings were recorded on a digital case record form. In this case record form, the following general image characteristics and specific radiological features were recorded: image quality, visualisation of the painful quadrant (quadrant of interest), infiltration of mesenteric fat (hyperechoic tissue), free fluid, abscess, free intra-peritoneal air and fistulas. Image characteristics were assessed per organ: gallbladder, bile duct, liver, pancreas, appendix, gastrointestinal tract, lymph nodes, vascular system, kidneys, and if appropriate, the female reproductive system. In the case of abnormalities further specification on the observed abnormality was warranted. All observers recorded an ultrasound diagnosis. Observers assigned their diagnoses based on the imaging findings in combination with the clinical information provided by the treating physician, no specific set of criteria was provided per diagnosis, reflecting daily practice. Ultrasound cases in which the quadrant of interest could not be visualised, were considered examinations with low quality.

Computed tomography

Different types of CT were used in the participating centres, varying from 4- to 16-slice or more CT (Table 1). All patients received intravenous contrast medium; no oral or rectal contrast agents were used. In 16 (1.6%) patients an unenhanced CT was performed because of known renal failure (n = 14); Or known previous reaction to contrast agents (n = 2).

Table 1 Imaging characteristics

The CT was evaluated in the same standardised way as the ultrasound examinations. Approximately the same general image findings and specific radiological features as at ultrasound were assessed for CT and recorded on a digital case record form: image quality, fat infiltration, free fluid, abscess, free intraperitoneal air and fistulas. Image assessment per organ: gallbladder, bile duct, liver, pancreas, appendix, gastrointestinal tract, lymph nodes, vascular system, kidneys, and if appropriate, female genitalia. If no abnormalities were recorded, no specification was asked, but in the case of abnormalities further specification on the observed abnormality was warranted, a CT diagnosis was recorded. Comparable to ultrasound, no specific set of criteria was provided per diagnosis to assist observers in assigning their diagnosis.

Reference standard

A final diagnosis was assigned after 6 months by an independent expert panel, consisting of two experienced gastrointestinal surgeons and an experienced abdominal radiologist (Appendix II) [6, 10]. Members of this panel individually evaluated all available data for each patient, including initial clinical, laboratory and imaging findings, as well as additional clinical, laboratory, imaging findings and if applicable, surgical and histopathological findings, and in and out-patient follow-up for at least 6 months. This information was provided to the expert panel in a standardised way. In case of disagreement, consensus was reached in a group discussion.

Analysis

The primary analysis was focused on a comparison of the accuracy of ultrasound and CT in detecting common diagnoses in patients with acute abdominal pain at the ED, using the final diagnosis as the reference standard. The sensitivity, specificity, positive and negative predictive values for ultrasound and CT were calculated. Differences in sensitivity and specificity between ultrasound and CT were evaluated with McNemar’s test statistic. Differences between ultrasound and CT with regard to predictive values were evaluated with the Chi-squared test statistic.

The percentage of diagnoses missed at ultrasound in patients in whom image quality was sufficient (patients in whom the quadrant of interest was visualised) was compared with the percentage of missed cases with insufficient image quality. The Chi-squared test statistic for unpaired data was used to test differences for statistical significance. The percentage of diagnoses missed was calculated as the number of false-negatives relative to the number of patients with the corresponding diagnosis as the final diagnosis (1-sensitivity).

As patient characteristics could influence the accuracy of ultrasound, potential differences in sensitivity between patient groups were evaluated. Patient subgroups were defined by sex, age, body mass index and duration of symptoms. In addition, sensitivity and predictive values of ultrasound in attending radiologists including supervised residents were compared with those of unsupervised residents. Unsupervised residents who had performed and evaluated less than 500 ultrasound examinations were compared with unsupervised residents who had performed and evaluated more than 500 ultrasound examinations. Subgroup differences were evaluated with Chi-squared test statistics.

For all comparisons p values less than 0.05 were taken to indicate statistically significant differences. All analyses were performed in SPSS 15.0.1 (SPSS Inc. Chicago, IL, USA)

Results

Patients

Between March 2005 and November 2006, 1,101 patients were included. Case record forms were incomplete for 80 patients (7.3%); these were excluded from the analysis. The remaining 1,021 patients had a mean age of 47 years (range 19–94); 484 (47%) were younger than 45 years, 258 (25%) were older than 65 years, 565 (55%) were female, 157 (15.4%) had a body mass index over 30, 320 (31%) had prolonged ‘acute’ abdominal pain for (more than 2 days but still less than 5 days), and 705 (69%) a body temperature exceeding 38°C.

Consensus on the final diagnosis was reached after individual evaluation in 76% of the patients; in 24% (244) the expert panel needed a group discussion to reach consensus. A list of the final diagnoses in the study group is provided in Appendix III. The most frequent final diagnoses were acute appendicitis, acute diverticulitis, bowel obstruction and acute cholecystitis. Urgent gynaecological disorders (n = 27) consisted of pelvic inflammatory disease (13), ovarian torsion (9), rupture or bleeding ovarian cyst (5).

Sensitivity

The sensitivity in detecting acute appendicitis and acute diverticulitis differed significantly between ultrasound and CT (both p < 0.01): ultrasound sensitivity in detecting acute appendicitis was 76% versus 94% for CT. Ultrasound sensitivity for acute diverticulitis was 61% versus 81% on CT (Table 2). For urgent gynaecological disorders the sensitivity was also significantly higher for CT than for ultrasound: 67% versus 37% (p = 0.04). Likewise, the sensitivity in detecting inflammatory bowel disorders was higher for CT than for ultrasound (p = 0.05). For acute cholecystitis and bowel obstruction sensitivity did not differ significantly between ultrasound and CT (p = 1.00 and 0.57, respectively (Table 2).

Table 2 Sensitivity, specificity, positive and negative predictive values for US and CT in patients with acute abdominal pain at the emergency department

Predictive values

Positive predictive values did not differ significantly in detecting acute appendicitis and acute diverticulitis between ultrasound and CT (Table 2). Positive predictive values for a final diagnosis of inflammatory bowel disorder were significantly higher with CT (p = 0.02). The negative predictive values for acute appendicitis and acute diverticulitis were significantly higher for CT (both p < 0.01).

Insufficient ultrasound image quality

Significantly fewer cases of acute appendicitis and of acute diverticulitis were missed in patients in whom the radiologist stated that image quality was sufficient compared with cases in which image quality was insufficient (Table 3). For all other diagnoses, the percentage of diagnoses missed with ultrasound was not significantly lower in patients with sufficient image quality compared with those with insufficient image quality (Table 3).

Table 3 Sensitivity of ultrasound with sufficient image quality versus insufficient image quality

Patient characteristics and missed diagnoses

The percentage of acute appendicitis and acute diverticulitis cases missed by ultrasound did not differ significantly in patient subgroups defined by sex, body mass index, duration of pain, or age (Table 4).

Table 4 Missed diagnoses of appendicitis and diverticulitis at ultrasound

Observers

In the six participating hospitals, ultrasound was evaluated by 107 different observers and CT was evaluated by 88 different observers, ranging from first-year radiology residents to a radiologist with more than 30 years of experience. Residents evaluated 582 (57%) of the ultrasound examinations, of which 282 were read after hours (28%), the latter not being supervised by radiologists. Of these non-supervised ultrasound examinations, 187 were performed by residents who had evaluated and performed more than 500 abdominal ultrasound examinations, and 95 were performed by residents who had evaluated and performed less than 500 abdominal ultrasound examinations. Radiologists evaluated 439 (43%) of the ultrasound examinations. CT were evaluated by supervised residents in 299 patients (29%); in 722 patients (71%) CT were evaluated by radiologists.

The sensitivity of ultrasound for acute appendicitis and acute cholecystitis was somewhat lower—with no significant difference—for unsupervised residents compared with attending radiologists including supervised residents: 73% versus 78% (p = 0.33) and 60% versus 62% (p = 0.43), respectively (Fig. 1).

Fig. 1
figure 1

Comparison of sensitivity and positive predictive value (PPV) for subgroups of observers

Ultrasound sensitivity in detecting acute appendicitis and acute diverticulitis

There were no significant differences between unsupervised residents who had evaluated (and performed) more than 500 ultrasound examinations and those who had evaluated less than 500 ultrasound examinations for these two diagnoses (Table 5). Unsupervised residents had a higher sensitivity than attending radiologists, including supervised residents for the diagnosis of diverticulitis with ultrasound, 83% versus 57% (p = 0.04). Here, the sensitivity was significantly higher for more experienced unsupervised residents (Table 5).

Table 5 Comparison of ultrasound accuracy per diagnosis for observers with different ultrasound experience

Positive predictive values for common diagnoses such as acute appendicitis, acute diverticulitis and acute cholecystitis were comparable for non-supervised residents and attending radiologists, including supervised residents (Fig. 1).

Discussion

In this study we found that the sensitivity of CT was significantly higher than that of ultrasound in detecting appendicitis and diverticulitis. Fewer cases of acute appendicitis and acute diverticulitis were missed by CT, but positive predictive values of ultrasound and CT were comparable. For acute cholecystitis and bowel obstruction there were no significant differences in accuracy between ultrasound and CT. No subgroup differences in ultrasound sensitivity in detecting acute appendicitis and acute diverticulitis were found for any of the evaluated patient characteristics: BMI, age and duration of pain. There were no statistically significant differences between obese women and men. The sensitivity of ultrasound performed by non-supervised radiological residents was not significantly lower than that of ultrasound performed by attending radiologists, including supervised residents. The percentage of missed acute appendicitis and acute diverticulitis cases was lower if the observer was able to visualise the region of interest compared with the percentage of missed cases of acute appendicitis or diverticulitis with insufficient image quality. For all other diagnoses, such a reduction in the number of missed diagnoses was not found.

A number of potential limitations of this analysis should be acknowledged. One could object that the sensitivity of US was underestimated, because ultrasound was partly performed and interpreted by unsupervised radiological residents. Unsupervised residents did not have a significantly lower sensitivity in detecting disease in this study compared with attending radiologists. In a previous study, the overall sensitivity of ultrasound performed by unsupervised residents for detecting urgent diagnoses was significantly lower than that of ultrasound performed by attending radiologists, without a significant difference in positive predictive value [6], indicating that residents more often missed an urgent diagnosis. Whenever an urgent diagnosis was assigned, however, this was most likely correct. In a study by Hertzberg et al. training in ultrasound was evaluated and a significant improvement was found at between 50 and 200 cases [11]. In the present study 23% of the observers had performed fewer than 500 abdominal ultrasound examinations, but only 4% had performed fewer than 100 ultrasound examinations.

Comparisons of CT accuracy between residents and radiologists or between CT reading after hours and during daytime were not considered meaningful, because residents were always supervised by a radiologist during daytime. The diagnosis recorded on the case record form by the supervised resident, for both CT and ultrasound, can be considered as a consensus diagnosis. CT scans of patients evaluated after hours were always re-evaluated the next day by a radiologist. For radiologists inter-observer agreement for abdominal CT is known to be good [9].

This study was aimed at evaluating ultrasound and CT in daily practice in six institutions. A considerable number of observers contributed, with a wide variety of experience. Although one could object that this may have negatively influenced accuracy, our study probably reflects daily practice better than studies where all patients were evaluated by one or two very experienced observers. It is a well known phenomenon that the diagnostic accuracy reported in the literature can be higher than that in an average hospital, not only because tests in research settings are often evaluated by experienced observers, but also because standardised record forms are used in studies to minimise the number of indeterminate findings [12].

With this study no specific set of criteria was provided to the observers from which a diagnosis was supposed to be made. Instead the observers assigned their ultrasound or CT diagnoses based on imaging findings in combination with the clinical information provided by the treating physician. This way of evaluating imaging examinations reflects daily practice.

We relied on an expert panel to assign the final diagnoses. This clinical reference standard may imply a form of incorporation bias, as the experts had access to all available information, including imaging findings. In this study population, with a wide variety of possible diagnoses, it is impossible to use a single reference standard, and the use of a panel is an appropriate alternative in a setting with multiple possible underlying diseases [13]. Our experts had access to extensive clinical information, including follow-up. A final diagnosis of acute appendicitis was based on histopathology in 95% of the cases, while the remaining 5% had undergone conservative therapy or percutaneous drainage of peri-appendiceal abscess.

In discordance with previous studies [6, 14], we did not find a significantly lower accuracy for residents compared with radiologists. One of the previous studies also demonstrated a significantly lower sensitivity of ultrasound in female patients compared with males with suspected appendicitis [14]. In our study, we did not see such a difference in sensitivity. Nor did we detect a significant difference between obese and non-obese patients in acute appendicitis cases and acute diverticulitis cases missed with ultrasound, although the number was markedly higher in obese women. It is a known limitation of ultrasound that it has difficulty in penetrating fat. Because ultrasound is a real-time examination not all obese patients are a priori unsuitable for ultrasound examination. In patients with a large proportion of extra-mesenteric fat ultrasound images can more often be interpreted adequately.

All patients underwent the same CT protocol for better evaluation of the accuracy of CT in patients with acute abdominal pain. If CT protocols had been tailored to the clinically suspected diagnosis [6], bias would have been introduced and a valid comparison of CT and ultrasound would not have been possible. Recent research has shown that usage of oral contrast agent does not increase the accuracy of diagnosing appendicitis with CT [15, 16]. For the evaluation of acute diverticulitis a wide variety of CT protocols is described in the literature, ranging from solely intravenous contrast to a combination of oral, rectal and intravenous contrast. The CT protocol solely using iv contrast agent did not achieve lower accuracy values compared with studies with extended contrast agent usage [8].

We observed a low prevalence in our study group of a number of important disorders, such as perforated viscus or bowel ischaemia and other common diagnoses causing acute abdominal pain such as pancreatitis and urinary tract calculus (patients with distinctive flank pain, suspected with renal colic, were not eligible for this study). This low prevalence limited any comparison of CT or ultrasound accuracy for the full range of diagnoses in patients presenting with acute abdominal pain.

The study reported here was not designed to separately evaluate the sensitivity and specificity of specific complications of any of the diagnoses causing acute abdominal pain. We only aimed to study the accuracy of ultrasound and CT in assigning the correct diagnosis.

A meta-analysis did not show any significant difference in accuracy between ultrasound and CT in detecting diverticulitis, although CT is more likely to detect complications of acute diverticulitis [8]. We did not find a significant difference in the accuracy of detecting bowel obstruction between ultrasound and CT; the aetiology of the obstruction is better evaluated with CT than with US. Likewise, a better accuracy for CT has been described in detecting complicated bowel obstruction [1721], although the accuracy of CT in the detection of bowel ischaemia is at best mediocre [22].

Some of the accuracy estimates for ultrasound in this study are lower than those reported elsewhere in the literature. The reported sensitivities for ultrasound in experienced hands in detecting appendicitis have been as high as 90% [23]. In recent meta-analyses of diagnostic imaging in acute appendicitis, ultrasound sensitivity varied between 86% [24] and 78% [7], which is comparable to the estimates in the present study. The accuracy in detecting acute diverticulitis is lower than in the aforementioned recent meta-analysis. Summary sensitivity of 92% for ultrasound was reported, which is much higher than the sensitivity of 68% [8]. The most likely explanation for this difference might be that we included unselected patients with acute abdominal pain, whereas the studies included in the meta-analysis more often had recruited selected patients with a clinically suspected acute diverticulitis. A higher pre-test likelihood of disease is known to result in a higher accuracy [25].

We observed the significantly higher sensitivity of CT compared with ultrasound with regard to urgent gynaecological disorders. This result may be counterintuitive to some as ultrasound is the imaging technique of choice in these patients [26]. Our findings may be explained by the fact that we used abdominal ultrasound performed by radiologists, not trans-vaginal ultrasound performed by the gynaecologist. Gynaecologists can be expected to be more experienced in the evaluation of gynaecological disorders; they can probably achieve a higher sensitivity with transvaginal ultrasound than radiologists can with transabdominal ultrasound. Unfortunately patients directly referred to gynaecologists are not routed through the emergency department and therefore not included in this study.

In summary, we observed that CT sensitivity is higher than that of ultrasound in detecting appendicitis and diverticulitis in unselected patients presenting with acute abdominal pain, but positive predictive values are comparable. Accuracy of bowel obstruction and acute cholecystitis were not significantly different. The percentage of cases missed on ultrasound was not influenced by patient characteristics and observer experience at large with regard to common diagnoses. The proportion of missed acute appendicitis and acute diverticulitis was significantly lower in the subgroup of patients in whom the radiologist could adequately visualise the region of interest. These results indicate that ultrasound is a good first-line technique.