Introduction

Amblyopia, commonly known as “lazy eye” is a neurodevelopmental disorder of binocular vision. Due to a mismatch between the perceptual information that reaches the cortex from the two eyes, one of them becomes suppressed. The synaptic connections in the visual cortex are reorganized in such a way that the corrected vision on the side of the suppressed eye suffers a reduction, even though no visible abnormalities are present [1, 2]. The significance of amblyopia in the population is not to be underestimated, the worldwide prevalence varies between 1 and 5% depending on many factors [3,4,5,6,7,8]. Unrecognized amblyopia can lead to various functional deficits of the visual system [9,10,11,12]. In addition, its psychosocial impact cannot be neglected either [13,14,15,16,17]. The prevalence of amblyopia is about three times greater in an unscreened population compared with a screened population [18]. In this perspective, the need of a reliable screening method is self-evident.

There are strict recommendations about how states should provide vision screening to all children before the age of 6 [19,20,21]. This is in accord with the guidelines of the American Academy of Pediatrics that recommends at least yearly visual assessment after the age of 3, and even more often in infancy and early childhood [22,23,24,25,26]. According to the professional guidelines of the Ministry of Health in Hungary, vision screening of children is the competence and duty of pediatricians, school doctors, district nurses, and health visitors, depending on the age of the child [27]. Although vision screening of children is obligatory, unfortunately, it is not performed regularly in the practice [28].

Since amblyopia is originated from abnormal early binocular visual experience due to amblyogenic conditions, it is accompanied with the loss or severe impairment of binocular depth perception [3, 29,30,31]. Considering the vulnerability of the binocular system, it has long been suggested to use stereotests for screening of amblyopia or conditions potentially leading to amblyopia [32,33,34,35]. In most stereotests, dichoptic viewing is required, which can be achieved by using one of the channel separation techniques (e.g., polarized or anaglyph images, column-interleaved displays). Ideally, when using the random dot stereograms, the disparity-coded stereoscopic images are only visible when viewed through the appropriate glasses (i.e., red-green or polarizing glasses). The TNO stereotest uses anaglyphic technique [36, 37], while Randot, the Randot Preschool Stereoacuity, and the Stereofly test (also known as Titmus Fly) use polarizing images [38, 39]. All of them are also suitable for measuring stereoacuity, including the Frisby, which is a real depth stereotest and does not require special glasses [40]. Lang and Lang II stereotests utilize the principle of “panography,” essentially a column-interleaved technique, and they do not require glasses either [41, 42].

Most of the conventional stereopsis tests available on the market contain monocular cues for various reasons, which can de-camouflage the cyclopean target [43, 44]. Additionally, all of them have a predetermined set of stimuli displayed on a plastic or paper board. Due to the limited number of figures, the test can be circumvented because motivated children are likely to memorize the expected responses. This effect is even more prominent in a school or kindergarten screening situation when children have an opportunity to communicate with their mates. Both the monocular cues and the predetermined set of figures can affect the ratio of both the false-positive and false-negative passes [45]. Despite the low sensitivity [46], we used the Lang II stereotest as reference in our study, because it is widespread and available in most clinical settings and family doctor practices in Hungary.

Disadvantages of the recently available stereo vision tests creates the need for developing more versatile tools using new innovative mobile technology that are free from the shortcomings and ease the documentation by using cloud communication (e.g., tablets, smartphones) [44].

Our laboratory has also begun to develop and evaluate a screening system called EuvisionTab, which is a tablet-based Android software, seems to be free of the above mentioned disadvantages, but it has not been tested in the clinical practice yet.

The aim of the present study was to investigate the diagnostic value of EuvisionTab and to compare it with the Lang II in terms of sensitivity and specificity in the detection of amblyopia and other amblyogenic factors [47]. Besides the evaluation of the performance of EuvisionTab, the determination of an optimal threshold for the pass/fail criteria was also the goal of this study. This clinical study was the first report of the results in developing a commercially available and practical test.

Methods

Participants and recruitment

A total of 141 children (aged 4–14, mean age 8.9, SD 2.63) were enrolled in the study for the validation of EuvisionTab. Patients were selected at the pediatric ophthalmology outpatient clinic of the Department of Ophthalmology, University of Pécs, Medical School in Hungary. Inclusion criteria consisted of diagnoses of amblyopia, anisometropia, convergent strabismus, and hyperopia. These were the target conditions. Healthy children or subjects without eye pathologies and with potentially intact stereo vision were also recruited as age-matched controls (n = 75). Convenience sampling was used in the selection of these patients. Before the statistical analysis, we had to exclude 19 subjects out of the 141 patients, because of various pathologies, such as Down’s syndrome, Marfan syndrome, nystagmus, congenital cataract, retinitis pigmentosa, bilateral congenital cataract, or retinopathy of prematurity. We chose not to include their data in the final analysis because the primary reason of this study was the validation of EuvisionTab as a screening test, and these patients were already under treatment or have been diagnosed beforehand on the grounds of their existing illnesses.

The diagnosis of amblyopia was based on reduced visual acuity (at least 0.8 or worse for the amblyopic eye) despite optimal refractive correction on the otherwise healthy eye [4]. Anisometropia was defined as the difference in refractive error between the two eyes of 1 diopter or more [48]. All other highlighted pathologies such as strabismus [49], and hyperopia [50], were determined according to the guidelines established in the international literature cited above.

All procedures performed in this study involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. This study was approved by the Regional and Local Ethics Committee at the University of Pécs (registration number: 5117). After explaining the parents and children the course of the study and potential risks and benefits, written informed consent was obtained from all parents or legal guardians. Oral informed consent was obtained from all children included in the study.

Stimuli and devices

EuvisionTab (Euvision Ltd., Pécs, Hungary) is a screening system developed at the University of Pécs as a new innovative medical diagnostic tool. Part of the system is a stereo vision module, based on random dot stereogram (RDS) technique for screening of amblyopia. The stereo vision module is essentially an RDS image generator with a great flexibility, because the parameters of the RDSs are freely adjustable. In the present study, we tested a dynamic version of the RDS, which had an embedded, disparity-defined Snellen-E optotype (DRDSE) presented in random orientations (up, down, left, or right). The orientation of the “E” was visible only if the images were viewed binocularly through red-green anaglyph glasses and the observer had intact stereopsis. The red-green glasses contained R26 low-pass (red) and YG09 band-pass (green) gelatin filters (Tobias Optic, Ltd., Budapest, Hungary). More details about the filter characteristics can be found in Markó et al. [51]. The actual screening set comprised a series of nine binocularly visible Snellen E letters and two monocularly visible Snellen-Es. The latter two served as control, to check if the subjects had really understood the task and were able to identify the Snellen E orientation regardless of the existing ocular condition.

We presented our stimuli on a Samsung Galaxy Tab 10.1 (CE0168 with Android 3.2 operating system.). Photometer was used to measure luminance (ILT-1700 Photometer, International Light Technologies, Peabody, USA). The mean luminance and standard deviation of the bright and dark pixels through the color filters were 22.937 cd/m2 (± 6.21 SD) and 0.72 cd/m2 (± 0.54 SD), respectively. The disparity and the dot size of the images were 420 sec of arc, the line thickness of the Snellen E optotype subtended 2° of visual angle from the 25- to 30-cm viewing distance. The 420 sec of arc was chosen because the human stereoscopic system is tuned to a certain range of crossed disparities. This disparity is the lowest within the optimum range [52]. The dot density of the images was 1%, with 3% uncorrelated noise. Images were updated at 10 Hz. Before the presentation of the 9 + 2 DRDSE, eight other, easier practice RDS images were shown to become familiar with the task and the stimuli. These RDS images were composed of either static (non-refreshing) dots, or they had larger disparity and a higher dot density. The responses for these practice sessions were not analyzed.

Lang II stereotest (Lang Stereotest AG, Forch, Switzerland) was performed according to the user’s manual provided with the test.

Procedures

Clinical examination

Children who visited the pediatric ophthalmology outpatient clinic with their guardian underwent a regular comprehensive eye examination by a trained pediatric ophthalmologist. The examination started with the assessment of visual acuity (VA) of both eyes using an illuminated Snellen E chart from a 5-m distance. They had to use their own pair of spectacles when it was applicable. Cover and cornea light reflex tests were performed using an ophthalmoscope to determine potential misalignment of the eyes. The examination lasted approximately 10 to 12 min/child. The collected data and diagnosis was handed over by the pediatric ophthalmologist and served as reference.

DRDSE testing procedure

After the pediatric ophthalmologist examination, oral consent was acquired from the children and written consent was obtained from the parents or legal guardians. A practice session preceded the actual testing where children could familiarize themselves with the task. Subjects were then seated in a quiet, separate dark room in order to avoid reflections on the screen and to provide better visibility for the stereograms. Children were asked to determine the orientation of each Snellen E optotype, that is, to either tell the direction or indicate it with their hand. We explained the task to the participants and double-checked their understanding with the presentation of non-stereo E letters. The children were given red-green spectacles, which were available in several sizes. The eight practicing figures were presented first and then the actual testing came. Although a short black period indicated the arrival of the new figure, children were also called upon when a new orientation had to be determined. The question, “Now, which way do the legs point?” was always asked for each letter. No suggestions were made for the child. The examiner then entered the response of the child by touching the appropriate button on the screen. Since the protocol was based on a forced choice paradigm, the subjects had to guess even if he/she did not see the letter. Feedback was not provided and the response time was not limited. The test was automatically terminated after the completion of the 9 + 2 orientations. The first monocularly visible image showed up as the fifth image, while the second as the 11th in the row. The orientations, however, were completely random. The entire procedure lasted for about 5 to 7 min, including parent and patient information, practice with non-stereo E letters, and the examination itself.

In this study, the test was performed by members of the research team, but it could be easily learned and performed by non-professional lay persons as well.

Lang II testing procedure

The Lang II stereotest was presented either before or after the DRDSE in a random order to avoid the failure of the same test due to fatigue or loss of interest.

The plastic card was held upright in front of the subject at 40 cm. Then, they were asked if they could find “the hidden figures” and name them. Since the Lang II has a monocularly visible star, it served as control to test if they understood the task, regardless of binocular pathology, because it is recognizable for patients with no stereopsis as well. If the child was unable to name the objects, we asked them to locate an area on the card where there seemed to be “something interesting” and try to outline its contours, then make a guess of what it could be.

Statistical analysis

Receiving operator characteristic curve

Since we used the same disparity in all test figures, we did not search for a stereo acuity threshold. However, we had to determine how many correct responses were necessary to pass the test. In order to determine the optimal pass/fail criteria for DRDSE and for the evaluation of sensitivity and specificity for detecting the target conditions of the screening test, the “receiver operating characteristic” (ROC) curve analysis was carried out [53]. Thereafter, to determine if detections provided by each stereotest for each ophthalmological condition was consistent or not, we applied the Pearson’s chi-square test and Fisher exact test. The threshold for rejecting the null hypothesis was set at p = 0.001. Finally, exact values of sensitivity and specificity were calculated according to the standard mathematical formulas [54]. For the statistical analysis, we used SPSS Statistics version 20 (IBM, Armond, NY, USA).

The interpretation of the Lang II test is fairly subjective. According to the official User’s Manual for Lang stereotest (Lang Stereotest AG, Forch, Switzerland), the task is to name and show the figures (i.e., monocular: star, binocular: moon (200 sec of arc), car (400 sec of arc), and elephant (600 sec of arc)) on the plastic card. The outcome of the Lang test is either positive, negative, or doubtful. The test is passed (positive) if all the figures are named and shown correctly. In case of a doubtful answer, the subject should be referred to a specialist for further examination. The result is negative if no object can be detected. These criteria seem to be too strict in the practice, because failure to name the objects correctly can have several reasons: (1) the lack or impairment of stereo vision, (2) the inability to understand the task, and (3) the child is unfamiliar with that object and creatively uses an alternative. Ohlsson et al. previously recognized and described the same methodological problem [55]. They defined normal stereopsis (negative/pass cases) if all the figures were named correctly (although they accepted “fish” for the “car” as normal). Confusing or doubtful answers were considered positive/fail cases. Some studies carried out in this field accept pointing out two figures out of three [56] or consider the incorrect naming or non-perception of one or more stereoscopic figures as fail [57]. In this specific study, we decided to use the same protocol as recommended by Huynh et al., applied in a study with more than 1700 participants [46]. According to this protocol, the identification of the elephant (disparity, 600 sec of arc) was the pass/fail criterion.

Finally, a comparison between tests (DRDSE vs. Lang) was done for these conditions for which a consistent detection ability was confirmed. For such purpose, the chi-square test was used.

Results

To address the problem of determining the pass/fail criteria for the DRDSE, the threshold was ascertained according to the ROC curve (Fig. 1). ROC curves are shown for the following conditions: hyperopia, convergent strabismus, amblyopia, and anisometropia. Furthermore, we established a fifth group for the subjects who had any of the aforementioned conditions (named “all conditions”).

Fig. 1
figure 1

Receiver operating characteristic curves for hyperopia (a), convergent strabismus (b), amblyopia (c), anisometropia (d), and all conditions (e). The blue line plots the true-positive rate (sensitivity) as a function of the false-positive rate (1-specificity). The distance of the ROC curve (blue line) to the upper left corner indicates the overall accuracy level of the test [73]. Points below the orange line represent worse than random results. Note that the ROC curve of anisometropia (d) falls below the orange diagonal (random guess), which divides the ROC space, but in all the other conditions (a, b, c, e), the curves are above the orange diagonal. For exact “area-under-curve” values, see Table 1

Detailed ROC analysis conducted for the DRDSE test is shown in Table 1, including the area under the curve (AUC), cut-off points, sensitivity and specificity, standard error, and p values (Table 1).

Table 1 Summary table of ROC curve analysis

As indicated above, the AUC for the detection ability of DRDSE test was statistically significant for all the target conditions except anisometropia (AUC = 0.59; p = 0.337). The area under the ROC curve quantifies the overall ability of the test to discriminate between individuals with or without the disease, and for this matter DRDSE is not a suitable test for the screening anisometropia. When evaluating the cut-off points in different diseases to achieve the best sensitivity and specificity ratio, we found that the selection of different cut-off points was necessary. As an example, for amblyopes, the pass level could be considered at 4/9, which was associated to a sensitivity of 100% and a specificity of 69.4%. In contrast, for hyperopes, a threshold of 5/9 was found to be optimal. Table 2 summarizes the number of cases identified either true positive or true negative when using the 5/9 as a pass/fail threshold of DRDSE test. An additional reason for rejecting pass level of 4/9 is that it would have been accompanied by too high probability (p = 0.16) of passing by chance (see Table 4 in the Appendix).

Table 2 True-positive and true-negative cases for DRDSE using the pass level of 5

Fisher exact test was performed to evaluate which amblyogenic condition can be significantly detected by Lang II (Table 3).

Table 3 True-positive and true-negative cases for the Lang II test and the p values for each target condition

Lang II stereotest proved to provide significantly consistent detections of amblyopia, strabismus, and “all conditions.” However, we could not reject the null hypothesis for anisometropia and hyperopia at our predetermined p = 0.001 threshold. When the overall performance was compared with the DRDSE (Fig. 2), anisometropia was excluded from the comparison, since none of the tests detect this condition consistently. Figure 2 represents sensitivity (a) and specificity (b) values of the two stereotests under investigation.

Fig. 2
figure 2

Bar charts for the comparison of the sensitivity (a) and specificity (b) values of the Lang II test (blue bars) and the DRDSE (red bars). Find exact percent values on the top of each bar

Sensitivity and specificity of DRDSE and Lang II tests

Chi-square test was carried out on the target population (“all conditions,” n = 51) to determine whether the pass/fail ratio of the two tests were statistically significant and whether they could be used interchangeably. The p value was 0.035, which means we can keep the null hypothesis at our predetermined (p = 0.001) significance level.

Discussion

This is the first study, which reports data about the clinical performance of DRDSE stereovision test, which is a product of a new innovative mobile technology, developed for mass screening of large population of children. This type of screening helps to identify amblyopia and amblyogenic conditions efficiently. The most important findings are as follows: (1) the DRDSE test significantly detects amblyopia, convergent strabismus and hyperopia, but fails to detect anisometropia; (2) the sensitivity of DRDSE for amblyopia and convergent strabismus was 100%; (3) the overall sensitivity and specificity, which includes target conditions of amblyopia, convergent strabismus, anisometropia, and hyperopia, were 75.0 and 75.6%, respectively; and (4) the overall sensitivity of Lang II test was 56.8%, which is less than that of the DRDSE; however, specificity was 87.3%, which is better than that of the DRDSE.

The present study supports the previously published data on the use of dynamic stereotests in screening [49, 58, 59]. The notion that standard stereotests such as Lang II, Frisby, TNO, etc. are simple and reliable tools for the detection of amblyogenic conditions has been questioned by many authors [57, 60, 61], and it supports the need for better tests. The ideal test is quick, easy to use, and has high sensitivity and specificity at the same time. Sensitivity is the most important parameter among the measures of performance of a screening method. Poor sensitivity, which is due to the high rate of “false passes,” increases the number of unrecognized pathologic conditions, which is ethically unaffordable. On the other hand, lower specificity causes overreferral, which is costly for healthcare system. Although both measures are important for the overall efficacy, poorer specificity could be forgiven but no compromise is acceptable in sensitivity. Many attempts have been made to investigate the efficacy of stereotests in vision screening, with the idea that they could simplify and speed up the examination. The additional information about the stereopsis status could modify the referral protocol, which in practice mostly relies on visual acuity. This practice can be originated from the most common definition of amblyopia which emphasizes the decrease in monocular visual acuity [4], despite the fact that the decreased visual acuity is a direct consequence of abnormal development of binocular vision [62]. It has been demonstrated in large-scale screenings that testing stereopsis could be faster and cognitively easier than visual acuity measurements [63]. Although some studies draw attention to the finding that stereotests with a low specificity and sensitivity are not recommended alone for vision screening [64, 65]. Besides, stereotests may fail to identify conditions where there is no significant loss in binocular vision, such as myopia [66], although symmetric myopia alone is not a typical amblyogenic condition. It is well known that visual acuity test by itself gives overreferrals, requires trained personnel, and can be time consuming and tedious for the children [67]. This controversy could potentially be resolved by constructing a stereotest with the highest possible sensitivity and specificity.

The sensitivity of the DRDSE test was 100%. To date, this is the highest sensitivity stereotest among the available ones.

Next, we will try to find explanations why the sensitivity can be so excellent compared with other tests, why the specificity is worse than for the Lang II, and how could we improve specificity while maintaining the sensitivity.

The excellent sensitivity can be explained by the level of difficulty. Low-density RDS requires global stereopsis to recognize the embedded disparity coded images. The RDS density was 1% with 3% uncorrelated noise, which is probably close to the detection threshold for children. The set of parameters which was used in the protocol has a very low chance for monocular artifact (Budai 2012, unpublished results). In addition to low dot density, the test was dynamic with a refresh rate of 10 Hz. The dynamically refreshing dots may add more difficulty in the perception of the disparity-coded image.

The high sensitivity, which is a considerable advantage of the stimulus might go hand in hand with its disadvantage (i.e., low specificity): it is a rather a hard task even for emmetropes to accomplish. Since the pass threshold of DRDSE has been optimized carefully and it cannot be further lowered, in order to diminish the level of difficulty, we may need to design different screening sets with increased dot density and/or reduced noise level, in favor of elevating the specificity of DRDSE, while preserving high sensitivity.

Younger patients are more likely to fail DRDSE due to their less developed intellect, even if they do not have amblyogenic factors. In our study we have found a great number of false positive cases (n = 18, mean age 8.8 years, SD 2.6). These are the children who did not have any of the highlighted pathologies but failed the test (Fig. 3). A possible explanation for this would have been the lower age of these patients, but Spearman’s rank correlation showed no significant positive relation between age and the performance on the test. Therefore, younger age cannot be accounted for false positivity.

Fig. 3
figure 3

Scatter plot shows the distribution of correct responses in relation to age (years)

Co-existing but potentially non-amblyogenic ophthalmologic conditions may result in decreased stereopsis and increase the number false positive cases. The 18 false positive patients had various eye conditions which are represented in Fig. 4. Astigmatism (n = 6) and myopia (n = 8) were the most frequently occurring refractive errors. We calculated a mean diopter for myopic patients (o.d. mean, − 2.1D and SD, 1.8; o.s. mean, − 2.0D and SD, 1.9). Concerning this information, we suspect that a certain degree of myopia alone or combined with astigmatism might lead to a decrease in stereovision. The effect of astigmatism on stereopsis is especially severe when the difference of the angles between the two eyes is more than 45° [68]. In our dataset, two patients had 40° or greater difference. This is in accord with the conclusions of Yang et al. and Kulkarni et al., who demonstrated a correlation between myopia, astigmatism, and reduced stereopsis [69, 70].

Fig. 4
figure 4

Ophthalmological diagnoses of false positive patients. Numbers in brackets show the number of cases. Some patients had more than one condition

The comparison of DRDSE and Lang II is problematic, because of the subjective interpretation [46, 55] (it has already been mentioned in the “Methods” section) and its qualitative result. Due to the lack of quantitative measure, the ROC analysis of Lang II is impossible, therefore optimization of pass/fail threshold was not feasible; these differences in the tests presumably influence the sensitivity and specificity.

Anisometropia was the least detectable condition with DRDSE (AUC = 0.59); this observation was also pointed out by Afsari et al. in their study comparing the diagnostic reliability of various stereoacuity tests [71]. It should be considered that relatively good levels of stereopsis can be present in patients with low levels of anisometropia [72].

The low sensitivity of Lang II is not surprising; this is in accordance with other investigators who found that the sensitivity of Lang stereotest is low and varies between 31.6 and 40% [57, 61]. The comparative analysis of pass/fail ratio in the “all-condition” population suggests that DRDSE and Lang tests are showing similar results, hence they could be used interchangeably.

In this study, we have found that the DRDSE test can be completed in about 5–7 min (including the explanation of the task to the children) and is easy to use (after a short training) for non-professional examiners such as teachers, parents, district nurses, or social workers. As for the technical background, a tablet with the appropriate software and red-green spectacles are required.

Our results are encouraging and should be validated by a larger sample size on the target population (4–6 years of age) for the screening instead of using the convenient sampling method on wider age group. Further works need to be done to improve the relatively low specificity.