An introduction to Rasch analysis for Psychiatric practice and research

https://doi.org/10.1016/j.jpsychires.2012.09.014Get rights and content

Abstract

This article aims to present the main characteristics of Rasch analysis in the context of patient reported outcomes in Psychiatry. We present an overview of the main features of the Rasch analysis, using as an example the latent variable of depressive symptoms, with illustrations using the Beck Depression Inventory. We will show that with fitting data to the Rasch model, we can confirm the structural validity of the scale, including key attributes such as invariance, local dependency and unidimensionality. We also illustrate how the approach can inform on the meaning of the numbers attributed to scales, the amount of the latent traits that such numbers represent, and the consequent adequacy of statistical operations used to analyse them. We would argue that fitting data to the Rasch model has become the measurement standard for patient reported outcomes in general and, as a consequence will facilitate a quality improvement of outcome instruments in psychiatry. Recent advances in measurement technologies built upon the calibration of items derived from Rasch analysis in the form of computerized adaptive tests (CAT) open up further opportunities for reducing the burden of testing, and/or expanding the range of information that can be collected during a single session.

Introduction

The use of patient reported outcomes in health care in general, and psychiatry in particular, has seen a rapid expansion over recent years. The ascertainment of latent constructs such as anxiety, depression and self harm has seen a steady increase in the number of instruments designed to measure such attributes (Bowen et al., 2008; Brunner et al., 2007; Fliege et al., 2009; Gamez et al., 2007; Garlow et al., 2008; Honarmand and Feinstein, 2009; King et al., 2008; Klonsky et al., 2003; Latimer et al., 2009; Parker et al., 2005; Pedersen, 2006; Pomerleau et al., 2003; Terluin et al., 2006; Tuisku et al., 2009). While some instruments are administered by professionals, the majority are self completed ‘patient reported outcomes’ and are widely used in both clinical practice and research (Bech, 2008; Chan et al., 2010; Chandler et al., 2010; Counts et al., 2010; Hawton et al., 2002; Norris and Aroian, 2008; Steinhausen et al., 2009). The obvious value of such instruments is that they can minimize the burden of assessment upon patients, and can be applied to large numbers, which may be more restricted, or not feasible in the case of structured clinical interviews.

However, the use of such scales has been the subject of some debate. Marshall et al. (2000), examining a number of controlled trials in schizophrenia, found that the intervention was more likely to be effective when unpublished scales were used, in opposite to validated ones. Another issue, which has been rarely considered, is that the majority of instruments derive ordinal scores, which indicate rank relationships (Stevens, 1946). Such scores are not capable of supporting mathematical calculations such as change scores, or parametric effect sizes (Smith, 2001). Consequently using ordinal scores in sophisticated parametric analyses could lead to misinference of the findings (Merbitz et al., 1989). However, ordinal scales, which provide a magnitude of the trait under consideration, are perfectly acceptable when the object is to identify a cut point, or magnitude of the trait, such as found in many instruments, for example, to ascertain depression. This application just relies on a specific magnitude, which is available from an ordinal scale. Thus, the problem is not necessarily the scale themselves (although it may be), but rather the way in which they are analysed.

In the formation of patient reported outcomes, the usual procedure has been to generate a scale with a certain number of items that intend to assess some observable behaviours related to the construct of interest (Tesio, 2003). Therefore, when setting out to measure such a construct we look for indicators (items) which are related to the construct, preferably in a way to be specified by an underlying theory. When someone responds to a certain question or item, the probability of the subject to endorse the item should depend on their level of the latent trait or ability (Baker, 2001). For example, it is expected that a more depressed subject will endorse an item regarding hopelessness more frequently than a non-depressed one. While this particular item does not directly measure depression (it addresses hopelessness), it helps in the construction of the depression score, together with other related items, which are designed to measure the latent variable (depression in this case).

In order to put together a set of items with the expectation that they measure the target construct, a set of psychometric requirements must be satisfied, and these requirements can be grouped into those associated with Classical Test Theory (CTT), and Modern Test Theory (MTT) (although in practice there is considerable overlap between the two). The present article aims to briefly review the former, and then go on to describe the potential contributions of the latter, in particular Rasch analysis, with respect to the development and testing of instruments. The Beck Depression Inventory (BDI) will be used as a practical example of this purpose.

Section snippets

Classical Test Theory

The measurement properties of most patient reported outcomes to-date have been evaluated from the CTT perspective. This has entailed publication of evidence concerning the reliability and the validity of the instrument. Reliability concerns whether or not the instrument has consistency, both internally (Cronbach's alpha) and over time (test–retest). Validity is often reported to comprise three central aspects, namely construct validity, criterion and content validity. These represent

Modern Test Theory (MTT) and the Rasch model

The first MTT models (under the generic label of Item Response Theory –IRT) appeared in the 1950s in the education area based on the need to build tests that would be at the same time simple, valid and with high discrimination power (Embretson and Reise, 2000). IRT represents a group of several distinct models, which share in common an assumption that the response to any particular item is a function of the difference between the ability of the person (or in our example their level of

An example using the BDI

To illustrate how data are fitted to the Rasch model, data were collected from a sample composed of 122 chronic patients, of whom 66 (54.1%) were male, and 56 (45.9%) were female. The most frequently reported health problems were hypertension (18%), heart diseases (15.6%), neoplasm (13.1%), diabetes (13.1%), emphysema/asthma/bronchitis (11.5%), autoimmune diseases (8.2%), and kidney diseases (8.2%). They were recruited in a tertiary hospital in Porto Alegre-RS-Brazil, in the different clinical

Discussion: Rasch applications in clinical research

This paper is an introductory paper to stress the potentialities of Rasch analysis for Psychiatric practice and research. The BDI was used here merely as an example. The BDI has been shown to satisfy Rasch model expectations after some adjustments, in a mixed diagnostic sample of a tertiary hospital. Designed to be used in a clinical sample of depressed patients to ascertain the severity of that depression, the distribution of thresholds across the continuum of depression is consistent with

Role of the funding source

This study was partially funded by FIPE-HCPA, CAPES and the University of Edinburgh.

Contributors

All authors managed the literature searches. Neusa Rocha and Alan Tennant undertook the statistical analysis, and Neusa Rocha, Eduardo Chachamovich and Marcelo Fleck wrote the first draft of the manuscript. All authors contributed to and have approved the final manuscript.

Conflict of interest

The authors declare that they have no conflict of interest.

Acknowledgements

None.

References (83)

  • V. Tuisku et al.

    Factors associated with deliberate self-harm behaviour among depressed adolescent outpatients

    Journal of Adolescence

    (2009)
  • M. Adler et al.

    An IRT validation of the Affective Self Rating Scale

    Nordic Journal of Psychiatry

    (2011)
  • G.W. Ahava et al.

    Is the Beck Depression Inventory reliable over time? an evaluation of multiple test–retest reliability in a nonclinical college student sample

    Journal of Personality Assessment

    (1998)
  • C. Alexandrino-Silva et al.

    Suicidal ideation among students enrolled in healthcare training programs: a cross-sectional study

    Revista Brasileira de Psiquiatria

    (2009)
  • D. Andrich

    A rating formulation for ordered response categories

    Psychometrika

    (1978)
  • D. Andrich

    Rasch models for measurements

    (1988)
  • D. Andrich et al.

    RUMM: a Windows program for analysing item response data according to Rasch Unidimensional Measurement Models

    (2004)
  • F.B. Baker

    The basics of item response theory

    (2001)
  • P. Bech

    Pichot – a tribute to the European psychopharmacologist on his 90th birthday

    European Psychiatric Review

    (2008)
  • A.T. Beck et al.

    BDI-II manual

    (1996)
  • T.G. Bond et al.

    Applying the Rasch model-fundamental measurement in the human sciences

    (2007)
  • T.K. Bouman et al.

    Homogeneity of Beck's Depression Inventory (BDI): applying Rasch analysis in conceptual exploration

    Acta Psychiatrica Scandinavica

    (1987)
  • A. Bowen et al.

    Anxiety in a socially high-risk sample of pregnant women in Canada

    Canadian Journal of Psychiatry

    (2008)
  • R. Brunner et al.

    Prevalence and psychological correlates of occasional and repetitive deliberate self-harm in adolescents

    Archives of Pediatrics & Adolescent Medicine

    (2007)
  • E. Castro-Costa et al.

    Ascertaining late-life depressive symptoms in Europe: an evaluation of the survey version of the EURO-D scale in 10 nations. The SHARE project

    International Journal of Methods in Psychiatric Research

    (2008)
  • J.S. Cinnamon et al.

    Preliminary evidence for the development of a stroke specific geriatric depression scale

    International Journal of Geriatric Psychiatry

    (2011)
  • E. Chachamovich et al.

    Development and validation of the Brazilian version of the Attitudes to Aging Questionnaire (AAQ): an example of merging classical psychometric theory and the Rasch measurement model

    Health and Quality of Life Outcomes

    (2008)
  • Y.F. Chan et al.

    Psychometric evaluation of the Hospital Anxiety and Depression Scale in a large community sample of adolescents in Hong Kong

    Quality of Life Research

    (2010)
  • G.M. Chandler et al.

    RESEARCH: validation of the Massachusetts general hospital Antidepressant Treatment History Questionnaire (ATRQ)

    CNS Neuroscience & Therapeutics

    (2010)
  • D.C. Clark et al.

    The core symptoms of depression in medical and psychiatric patients

    Journal of Nervous and Mental Disease

    (1983)
  • T. Covic et al.

    Variability in depression prevalence in early rheumatoid arthritis: a comparison of the CES-D and HAD-D Scales

    BMC Musculoskeletal Disorders

    (2009)
  • S.E. Embretson et al.

    Item response theory for psychologists

    (2000)
  • G. Fischer et al.

    Rasch models. Foundations, recent developments and applications

    (1995)
  • R.A. Fisher

    On the mathematical foundations of theoretical statistics

    Philosophical Transactions of the Royal Society

    (1921)
  • T. Forkmann et al.

    Development of an item bank for the assessment of depression in persons with mental illnesses and physical diseases using Rasch analysis

    Rehabilitation Psychology

    (2009)
  • S.J. Garlow et al.

    Depression, desperation, and suicidal ideation in college students: results from the American Foundation for Suicide Prevention College Screening Project at Emory University

    Depression and Anxiety

    (2008)
  • C.J. Gibbons et al.

    Rasch analysis of the hospital anxiety and depression scale (HADS) for use in motor neurone disease

    Health and Quality of Life Outcomes

    (2011)
  • P. Hagell et al.

    Health status measurement in Parkinson's disease: validity of the PDQ-39 and Nottingham Health Profile

    Movement Disorders

    (2003)
  • K. Hawton et al.

    Deliberate self harm in adolescents: self report survey in schools in England

    British Medical Journal

    (2002)
  • M.J. Hayden et al.

    Confirmatory factor analysis of the Beck Depression Inventory in obese individuals seeking surgery

    Obesity Surgery

    (2010)
  • H.W. Helm et al.

    Factor structure of the Beck Depression Inventory in a university sample

    Psychological Reports

    (2003)
  • Cited by (77)

    • The four self-efficacy trajectories among people with multiple sclerosis: Clinical associations and implications

      2022, Journal of the Neurological Sciences
      Citation Excerpt :

      Data from these PROMs (excluding the EQ-5D-5L, for which utility value is determined by the pattern of responses) were fit to the Rasch measurement model, to provide interval-level latent estimates for parametric analysis. Details of the process of Rasch analysis are described in detail elsewhere [27–30]. Fit of the data to the model was undertaken in a calibration sample consisting of multiple time points where individuals were sampled without replacement, such that no one individual appeared more than once in the sample.

    • Measuring coping in people with amyotrophic lateral sclerosis using the Coping Index-ALS: A patient derived, Rasch compliant scale

      2021, Journal of the Neurological Sciences
      Citation Excerpt :

      Where the items have polytomous response options, whether the transition (threshold) from one category to the next reflects an appropriate increase (monotonicity) in the trait being measured is also considered [23]. Full details of the process are given elsewhere [19,24]. For the current analysis, all chi-square and ANOVA-related fit and DIF statistics adopted a Type I error rate of 0.05, Bonferroni adjusted [25].

    • Sexual Morbidity Assessment in Gyne-Oncology Follow-Up: Development of the Sexual Well-Being After Cervical or Endometrial Cancer (SWELL-CE) Patient-Reported Outcome Measure

      2020, Journal of Sexual Medicine
      Citation Excerpt :

      Analysis was undertaken in MPlus (Muthén L.K. & Muthén B.O., Los Angeles, CA, USA) based upon a tetrachoric correlation matrix using the unweighted least squares estimation with a Promax rotation.28 Rasch analysis is now widely used in the construction of PROMS.29,30 As Rasch analysis accommodates missing data points within the calibration process, the complete data set was used.

    View all citing articles on Scopus
    View full text