Skip to main content

Part of the book series: Springer International Handbooks of Education ((SIHE,volume 7))

Summary

The purpose of this chapter is to provide an overview of several concepts and terms that were originally defined and investigated in the corner of education that housed psychometrics, but have migrated to the more general education literature.Definitions, explanations, and examples will be given for the commonly used terms including reliability, generalizability, and validity. Following the discussion of the common psychometric concepts and terms, the second part of the chapter provides an overview of how one might use these concepts in designing or choosing an instrument. The third part of the chapter will introduce some newer and more advanced topics that have received attention in recent years. The chapter will conclude with a brief review of practical suggestions for those engaged in educational research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 429.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 549.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 549.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Reference

  • American Educational Research Association, American Psychological Association, & National Council for Measurement in Education. (1985).Standards for educational and psychological testing.Washington, DC: American Psychological Association, Inc.

    Google Scholar 

  • Angoff, W. H. (1984).Scales norms and equivalent scores.Princeton, NJ: Educational Testing Services. Originally published in R. L. Thorndike (Ed.), (1971).Educational Measurement(2nd ed., pp. 508–600). Washington, DC: American Council on Education.

    Google Scholar 

  • Ansley, N. A., & Forsyth, R. A. (1985). An examination of the characteristics of unidimensional IRT parameter estimates derived from two-dimensional data.Applied Psychological Measurement9(1), 37–48.

    Article  Google Scholar 

  • Babbie, E. R. (1973).Survey research methods.Belmont, CA: Wadsworth.

    Google Scholar 

  • Baker, F. B. (1985).The basics of item response theory.Portsmouth, NH: Heinemann.

    Google Scholar 

  • Blacklow, R. S., Goepp, C. E., & Hojat, M. (1993). Further psychometric evaluations of a class-ranking model as a predictor of graduates’ clinical competence in the first year of residency.Academic Medicine 68(4)295–297.

    Article  Google Scholar 

  • Brennan, R. L. (1992).Elements of generalizability theory(2nd ed.). Iowa City, IA: American College Testing Program.

    Google Scholar 

  • Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitraitmultimethod matrix.Psychological Bulletin 5681–105.

    Article  Google Scholar 

  • Campbell, D. T., & Stanley, J. C. (1963).Experimental and quasi-experimental designs for research.Chicago: Rand McNally College Publishing Company.

    Google Scholar 

  • Carmines, E. G., & Zeller, R. A. (1979).Reliability and validity assessment.Beverly Hills, CA: Sage Publications.

    Book  Google Scholar 

  • Concato, J., & Feinstein, A. R. (1997). Asking patients what they like: overlooked attributes of patient satisfaction with primary care.American Journal of Medicine 102399–406.

    Article  Google Scholar 

  • Cook, L. L., & Eignor, D. R. (1991). An NCME instructional module on IRT equating methods.Educational Measurement: Issues and Practice 10(3)37–45.

    Article  Google Scholar 

  • Crocker, L., & Algina, J. (1986).Introduction to classical and modern test theory.New York: Holt, Rinehart, and Winston.

    Google Scholar 

  • Cronbach, L. J., & Furby, L. (1970). How should we measure “change”- or should we?Psychological Bulletin 7468–80.

    Article  Google Scholar 

  • Cronbach, L. J., Gleser, G. C., Nanda, H.&Rajaratnam, N. (1972).The dependability of behavioral measurements: Theory of generalizability for scores and profiles.New York: Wiley.

    Google Scholar 

  • Dawson-Saunders, B., & Trapp, R. G. (1994).Basic and clinical biostatistics(2nd ed.). Norwalk, CT: Appleton and Lang.

    Google Scholar 

  • DeVellis, R. F. (1991).Scale development: Theory and applications.Newbury Park: Sage Publications. Dorans, N. J. (1990). Equating methods and sampling designs.Applied Measurement in Education 33–17. Drasgow, F., & Parsons, C. K. (1983). Application of unidimensional item response theory models to multidimensional data.Applied Psychological Measurement7(2), 189–199.

    Google Scholar 

  • Feldt, L. S., & Brennan, R. L. (1989). Reliability. In R. L. Linn (Ed.)Educational measurement(3rd ed., pp. 105–146). New York: American Council on Education and Macmillan.

    Google Scholar 

  • Fisher, R. M. (1925).Statistical methods for research workers.London: Oliver and Boyd.

    Google Scholar 

  • Fowler, F. J., Jr. (1995).Improving survey questions: Design and evaluation.Newbury Park: Sage Publications.

    Google Scholar 

  • Gorsuch, R. L. (1983).Factor analysis.Hillsdale, NJ: Erlbaum Associates.

    Google Scholar 

  • Gronlund, N. E. (1985).Measurement and evaluation in teaching(5th ed.). New York: Macmillan.

    Google Scholar 

  • Guilford, J. P., & Fruchter, B. (1978).Fundamental statistics in psychology and education(6th ed.). New York: McGraw-Hill.

    Google Scholar 

  • Guion, R. M. (1998).Assessment measurement and prediction for personnel decisions.Mahwah, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  • Guyatt, G., Walter, S. D., & Norman, G. R. (1987). Measuring change over time: Assessing the usefulness of evaluative instruments.Journal of Chronic Diseases 40171–178.

    Article  Google Scholar 

  • Hambleton, R. K. (1989). Principles and selected applications of item response theory. In R. L. Linn (Ed.)Educational measurement(3rd ed., pp. 147–200). New York: American Council on Education and Macmillan.

    Google Scholar 

  • Hambleton, R. K., & Rovinelli, R. J. (1986). Assessing the dimensionality of a set of test items.Applied Psychological Measurement 10(3)287–302.

    Article  Google Scholar 

  • Hambleton, R. K., & Swaminathan, H. (1985).Item response theory: Principles and applications.Boston: Kluwer Nijhoff.

    Book  Google Scholar 

  • Holmes, W. C., & Shea, J. A. (1998). A new HIV/AIDS-targeted quality of life (HT-QoL) instrument: development, reliability, and validity.Medical Care36, 138–154.

    Article  Google Scholar 

  • Kane, M., Crooks, T., & Cohen, A. (1999). Validating measures of performance.Educational Measurement: Issues and Practice 18(2)5–17.

    Article  Google Scholar 

  • Kerlinger, F. N. (1986).Foundations of behavioral research(3rd ed.). New York: Holt, Rinehart and Winston.

    Google Scholar 

  • Kitzinger, J. (1995). Introducing focus groups.British Medical Journal 31199–302.

    Article  Google Scholar 

  • LaDuca, A. (1994). Validation of professional licensure examinations: Professions theory, test design, and construct validity.Evaluation in the Health Professions17(2), 178–197.

    Article  Google Scholar 

  • Lazarus, G. S., Foulke, G., Bell, R. A., Sietkin, A. D., Keller, K., & Kravitz, R. L. (1998), The effects of a managed care educational program on faculty and trainee knowledge, attitudes, and behavioral intentions.Academic Medicine73, 1107–1113.

    Article  Google Scholar 

  • Likert, R. (1932). A technique for the measurement of attitudes.Archives of Psychology No. 14055.

    Google Scholar 

  • Linn, P. L., & Slinde, J. A. (1977). Determination of the significance of change between pre-and posttesting periods.Reviews of Educational Research 47121–150.

    Article  Google Scholar 

  • Lloyd-Jones, G., Fowell, S., & Bligh, J. G. (1999). The use of the nominal group technique as an evaluative tool in medical undergraduate education.Medical Education 33(1)8–13.

    Article  Google Scholar 

  • Lord, F. M. (1980).Applications of item response theory to practical testing problems.Hillsdale, NJ: Lawrence Erlbaum.

    Google Scholar 

  • Lord, F. M., & Novick, M. N. (1968).Statistical theories of mental test development.Reading, MA: Addison-Wesley.

    Google Scholar 

  • McHomey, C. A., Ware, J. E., Lu, J. F. R., & Sherbourne, C. D. (1994). The MOS 36item short-form health survey (SF-36):III. Tests of data quality, scaling assumptions, and reliability across diverse patient groups.Medical Care32, 40–66.

    Article  Google Scholar 

  • McKinley, R. L. (1988). A comparison of six methods for combining multiple IRT item parameter estimates.Journal of Educational Measurement 25233–246.

    Article  Google Scholar 

  • Messick, S. (1989). Validity. In R. L. Linn (Ed.)Educational measurement(3rd ed., pp. 13–103). New York: American Council on Education and Macmillan.

    Google Scholar 

  • Millman, J., & Greene, J. (1989). The specification and development of tests of achievement and ability. In R. L. Linn (Ed.).Educational measurement(3rd ed., pp. 335–366). New York: American Council on Education and Macmillan.

    Google Scholar 

  • Mislevy, R. J., & Bock, R. D. (1986).BILOG: Item analysis and test scoring with binary logistic models.Mooresville, IN: Scientific Software.

    Google Scholar 

  • Moore, G. T., Block, S. D., Style, C. B., & Mitchell, R. (1994). The influence of the New Pathway curriculum on Harvard medical students.Academic Medicine69, 983–989.

    Article  Google Scholar 

  • Nunnally, J. C. (1978).Psychometric theory.New York: McGraw-Hill.

    Google Scholar 

  • Nunnally, J. C., & Bernstein, I. H. (1994).Psychometric theory(3rd ed.). New York: McGraw-Hill.

    Google Scholar 

  • Petersen, N. S., Cook, L. L., & Stocking, M. L. (1983). IRT versus conventional equating methods: A comparative study of scale stability.Journal of Educational Statistics 8137–156.

    Article  Google Scholar 

  • Petersen, N. S., Kolen, M. J., & Hoover, H. D. (1989). Scaling, norming, and equating. In R. L. Linn (Ed.)Educational measurement(3rd ed., pp. 221–262). New York: American Council on Education and Macmillan.

    Google Scholar 

  • Pitts, J., Coles, C., & Thomas, P. (1999). Educational portfolios in the assessment of general practice trainers: reliability of assessors.Medical Education 33(7)515–520.

    Article  Google Scholar 

  • Popham W. J. (1997). Consequential validity: Right concern-wrong concept.Educational Measurement: Issues and Practice16(2), 9–13.

    Article  Google Scholar 

  • Ramsey, P. G., Carline, J. D., Inui, T. S., Larson, E. B., LoGerfo, J. P., & Wenrich, M. D. (1989). Predictive validity of certification by the American Board of Internal Medicine.Annals of Internal Medicine 110(9)719–726.

    Article  Google Scholar 

  • Shavelson, R. J., & Webb, N. M. (1991).Generalizability theory: A primer.Newbury Park, CA: Sage.

    Google Scholar 

  • Shea, J. A., Norcini, J. J., & Webster, G. D. (1988). An application of item response theory to certifying examinations in internal medicine.Evaluation and the Health Professions 11(3)283–305.

    Article  Google Scholar 

  • Shea, J. A.&Norcini, J. J. (1995). Equating. In J. Impara (Ed.)Licensure Testing: Purposes procedures and practices(pp. 253–287). Lincoln, NE: Burns Institute of Mental Measurements.

    Google Scholar 

  • Shepard, L. A. (1997). The centrality of test use and consequences for test validity.Educational Measurement: Research and Practice 16(2)5–8, 13, 24.

    Google Scholar 

  • Skaggs, G., & Lissitz, R. W. (1986a). An exploration of the robustness of four test equating methods.Applied Psychological Measurement 10303–317.

    Article  Google Scholar 

  • Skaggs, G., & Lissitz, R. W. (1986b). IRT test equating: Relevant issues and a review of recent literature.Review of Educational Research 56y495–529.

    Article  Google Scholar 

  • Spearman C. E. (1904). The proof and measurement of association between two things.American Journal of Psychology 1572–101.

    Article  Google Scholar 

  • Streiner D. L. (1994). Figuring out factors: the use and misuse of factor analysis.Canadian Journal of Psychiatry- Revue Canadienne de Psychiatrie 39135–140.

    Article  Google Scholar 

  • Streiner, D. L., & Norman, G. R. (1995).Health measurement scales: A practical guide to their development and use(2nd ed.). Oxford: Oxford University Press.

    Google Scholar 

  • Stocking, M. L., & Lord, F. M. (1983). Developing a common metric in item response theory.Applied Psychological Measurement7, 201–210.

    Article  Google Scholar 

  • Suen, H. K. (1990).Principles of test theories.Hillsdale, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  • Swaminathan, H. (1983). Parameter estimation in item response models. In R. K. Hambleton (Ed.)Applications of item response theory(pp. 24–44). Vancouver: Educational Research Institute of British Columbia.

    Google Scholar 

  • Swanson, D. B., Case, S. M., & Nungester, R. J. (1991). Validity of NBME Part I and Part II scores in prediction of Part III performance.Academic Medicine 66(9RIME Suppl.), S7–S9.

    Google Scholar 

  • Wenzel, L. S., Briggs, K. L., & Puryear, B. L. (1998). Portfolio: authentic assessment in the age of the curriculum revolution.Journal of Nursing Education37(5), 208–212.

    Google Scholar 

  • Wingersky, M. S., Barton, M. A., & Lord, F. M. (1982).LOGIST User’s guide.Princeton, NJ: Educational Testing Service.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Shea, J.A., Fortna, G.S. (2002). Psychometric Methods. In: Norman, G.R., et al. International Handbook of Research in Medical Education. Springer International Handbooks of Education, vol 7. Springer, Dordrecht. https://doi.org/10.1007/978-94-010-0462-6_4

Download citation

  • DOI: https://doi.org/10.1007/978-94-010-0462-6_4

  • Published:

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-010-3904-8

  • Online ISBN: 978-94-010-0462-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics