Abstract
This chapter reviews some of the important research in nonparametric and parametric item response theory (IRT) today, and considers some current measurement challenges in education and cognitive psychology. This leads to assessment models that do not look very much like today’s IRT models, but for which the tools and conceptual framework of nonparametric and parametric IRT are still quite well suited.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ackerman, T.A. (1994). Using multidimensional item response theory to understand what items and tests are measuring. Applied Measurement in Education, 7, 255–278.
Adams, R.J., Wilson, M., & Wang, W.-C. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21, 1–23.
Algina, J. (1992). The National Assessment of Educational Progress (Editor’s Note) [Special issue]. Journal of Educational Measurement, 29, 93–94.
Anderson, J.R. (Ed.). (1993). Rules of the mind. Hillsdale, NJ: Erlbaum.
Anderson, J.R., Corbett, A.T., Koedinger, K.R., & Pelletier, R. (1995). Cognitive tutors: Lessons learned. The Journal of the Learning Sciences, 4, 167–207.
Andrich, D. (1996). A hyperbolic cosine latent trait model for unfolding polytomous responses: Reconciling Thurstone and Likert methodologies. British Journal of Mathematical and Statistical Psychology, 49, 347–365.
Bartolucci, F., & Forcina, A. (in press). A likelihood ratio test for MTP2 within binary variables. The Annals of Statistics.
Baxter, G.P., & Glaser, R. (1998). Investigating the cognitive complexity of science assessments. Educational Measurement: Issues and Practice, 17, 37–45.
Béguin, A.A., & Glas, C.A.W. (1998). MCMC estimation of multidimensional IRT models. (Research Report 98-14). University of Twente, Department of Educational Measurement and Data Analysis, The Netherlands.
Bloom, B.S. (1984). The 2-sigma problem: The search for methods of group instruction as effective as one-to-one tutoring. Educational Researcher, 13, 4–16.
Bock, R.D., & Zimowski, M.F. (1997). Multi-group IRT. In W.J. van der Linden & R.K. Hambleton (Eds.), Handbook of modern item response theory (pp. 433–448). New York: Springer-Verlag.
Bradlow, E.T., Wainer, H., & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64, 153–168.
Cliff, N., & Donoghue, J.R. (1992). Ordinal test fidelity estimated by an item sampling model. Psychometrika, 57, 217–236.
Corbett, A.T., Anderson, J.R., & O’Brien, A.T. (1995). Student modeling in the ACT programming tutor. In P.D. Nichols, S.F. Chipman, & R.L. Brennan (Eds.), Cognitively diagnostic assessment (pp. 19–41). Hillsdale, NJ: Erlbaum.
DiBello, L., Jiang, H., & Stout, W.F. (2000). A multidimensional IRT model for practical cognitive diagnosis. Manuscript submitted for publication.
Douglas, J., & Qui, P. (1997). Generalized linear factor analysis with Markov chain Monte Carlo. Unpublished manuscript, University of Wisconsin, Department of Biostatistics and Medical Informatics, Madison,WI.
Draney, K. L., Pirolli, P., & Wilson, M. (1995). A measurement model for a complex cognitive skill. In P.D. Nichols, S.F. Chipman, & R.L. Brennan (Eds.), Cognitively diagnostic assessment (pp. 103–125). Hillsdale, NJ: Erlbaum.
Drasgow, F., Levine, M.V., Tsien, S., Williams, B., & Mead, A.D. (1995). Fitting polytomous item response theory models to multiple-choice tests. Applied Psychological Measurement, 19, 143–165.
Ellis, J.L. (1994). Foundations of monotone latent variable models. Nijmegen: Nijmegen Institute for Cognition and Information.
Ellis, J.L., & Junker, B.W. (1997). Tail-measurability in monotone latent variable models. Psychometrika, 62, 495–523.
Embretson, S.E. (1991). A multidimensional latent trait model for measuring learning and change. Psychometrika, 56, 495–515.
Embretson, S.E. (1995). Developments toward a cognitive design system for psychological tests. In D. Lubinski & R.V. Dawis (Eds.), Assessing individual differences in human behavior: New concepts, methods and findings (pp. 17–48). Palo Alto, CA: Davies-Black Publishing.
Embretson, S.E. (1997). Multicomponent response models. In W.J. van der Linden & R.K. Hambleton (Eds.), Handbook of modern item response theory (pp. 305–321). New York: Springer-Verlag.
Embretson, S.E. (1999). Generating items during testing: Psychometric issues and models. Psychometrika, 64, 407–433.
Engelhard, G., Jr. (1994). Examining rater errors in the assessment of written composition with many-faceted Rasch models. Journal of Educational Measurement, 31, 93–112.
Fischer, G.H. (1973). Linear logistic test model as an instrument in educational research. Acta Psychologica, 37, 359–374.
Fischer, G.H., & Molenaar, I.W. (Eds.). (1995). Rasch models: Foundations, recent developments, and applications. New York: Springer-Verlag.
Fox, G.J.A., & Glas, C.A.W. (1998). A multi-level IRT model with measurement error in the predictor variables (Research Report 98-16). University of Twente, Department of Educational Measurement and Data Analysis, The Netherlands.
Fraser, C., & McDonald, R.P. (1988). NOHARM: Least squares item factor analysis. Multivariate Behavioral Research 23, 267–269.
Gardner, H. (1992). Assessment in context: The alternative to educational testing. In B.R. Gifford & M.C. O’Connor (Eds.), Changing assessments: Alternative views of aptitude, achievement, and instruction (pp. 77–119). Norwell, MA: Kluwer.
Gelman, A., Carlin, J.B., Stern, H.S., & Rubin, D.B. (1995). Bayesian data analysis. New York: Chapman and Hall.
Gibbons, R.D., & Hedeker, D.R. (1992). Full-information item bi-factor analysis. Psychometrika, 57, 423–436.
Gibbons, R.D., & Hedeker, D.R. (1997). Random effects probit and logistic regression models for three-level data. Biometrics, 53, 1527–1537.
Glas, C.A.W., & Verhelst, N.D. (1989). Extensions of the partial credit model. Psychometrika, 54, 635–659.
Grayson, D.A. (1988). Two-group classification in latent trait theory: Scores with monotone likelihood ratio. Psychometrika, 53, 383–392.
Haertel, E.H. (1989). Using restricted latent class models to map the skill structure of achievement items. Journal of Educational Measurement, 26, 301–321.
Hemker, B.T. (2001). Reversibility revisited and other comparisons of three types of polytomous IRT models. In A. Boomsma, M.A.J. van Duijn, & T.A.B. Snijders (Eds.), Essays on item response theory (pp. 277–296). New York: Springer-Verlag.
Hemker, B.T., Sijtsma, K., & Molenaar, I.W. (1995). Selection of unidimensional scales from a multidimensional item bank in the polytomous Mokken IRT model. Applied Psychological Measurement, 19, 337–352.
Hemker, B.T., Sijtsma K., Molenaar, I.W., & Junker, B.W. (1996). Polytomous IRT models and monotone likelihood ratio of the total score. Psychometrika, 61, 679–693.
Hemker, B.T., Sijtsma K., Molenaar, I.W., & Junker, B.W. (1997). Stochastic ordering using the latent trait and the sum score in polytomous IRT models. Psychometrika, 62, 331–347.
Hoijtink, H., & Molenaar, I.W. (1997). A multidimensional item response model: Constrained latent class analysis using the Gibbs sampler and posterior predictive checks. Psychometrika, 62, 171–189.
Holland, P.W. (1981). When are item response models consistent with observed data? Psychometrika, 46, 79–92.
Holland, P.W. (1990). The Dutch identity: A new tool for the study of item response models. Psychometrika, 55, 5–18.
Holland, P.W., & Rosenbaum, P.R. (1986). Conditional association and unidimensionality in monotone latent trait models. The Annals of Statistics, 14, 1523–1543.
Huguenard, B.R., Lerch, F.J., Junker, B.W., Patz, R.J., & Kass, R.E. (1997). Working memory failure in phone-based interaction. ACM Transactions on Computer-Human Interaction, 4, 67–102.
Huynh, H. (1994). A new proof for monotone likelihood ratio for the sum of independent Bernoulli random variables. Psychometrika, 59, 77–79.
Jaakkola, T.S., & Jordan, M.I. (2000). Bayesian parameter estimation via variational methods. Statistics and Computing, 10, 25–37.
Janssen, R., & De Boeck, P. (1997). Psychometric modeling of componentially designed synonym tasks. Applied Psychological Measurement, 21, 37–50.
Johnson, E.G., Mislevy, R.J., & Thomas, N. (1994). Theoretical background and philosophy of NAEP scaling procedures. In E.G. Johnson, J. Mazzeo, & D.L. Kline (Eds.), Technical Report of the NAEP 1992 Trial State Assessment Program in Reading (pp. 133–146). Washington, DC: Office of Educational Research and Improvement, U.S. Department of Education.
Junker, B.W. (1993). Conditional association, essential independence and monotone unidimensional item response models. The Annals of Statistics, 21, 1359–1378.
Junker, B.W. (1998). Some remarks on Scheiblechner’s treatment of ISOP models. Psychometrika, 63, 73–85.
Junker, B.W., & Ellis, J.L. (1997). A characterization of monotone unidimensional latent variable models. The Annals of Statistics, 25, 1327–1343.
Junker, B.W., Koedinger, K.R., & Trottini, M. (2000, July). Finding improvements in student models for intelligent tutoring systems via variable selection for a linear logistic test model. Paper presented at the Annual North American Meeting of the Psychometric Society, Vancouver, Canada.
Junker, B.W., & Sijtsma, K. (2000). Latent and manifest monotonicity in item response models. Applied Psychological Measurement, 24, 65–81.
Kass, R.E., Tierney, L., & Kadane, J.B. (1990). The validity of posterior expansions based on Laplace’s method. In S. Geisser, J.S. Hodges, S.J. Press, & A. Zellner (Eds.), Bayesian and likelihood methods in statistics and econometrics: Essays in honor of George A. Barnard (pp. 473–488). New York: North-Holland.
Kelderman, H., & Rijkes, C.P.M. (1994). Loglinear multidimensional IRT models for polytomously scored items. Psychometrika, 59, 149–176.
Lee, Y., & Neider, J.A. (1996). Hierarchical generalized linear models (with discussion). Journal of the Royal Statistical Society, Series B, 58, 619–678.
Linacre, J.M. (1989). Many-faceted Rasch measurement. Chicago: Mesa Press.
Liu, C., & Rubin, D.B. (1998). Maximum likelihood estimation of factor analysis using the ECME algorithm with complete and incomplete data. Statistica Sinica, 8, 729–747.
Loevinger, J. (1948). The technique of homogeneous tests compared with some aspects of “scale analysis” and factor analysis. Psychological Bulletin, 45, 507–530.
Lord, F.M. (1952). A theory of test scores. Psychometric Monographs, 7.
Maris, E. (1995). Psychometric latent response models. Psychometrika, 60, 523–547.
Maris, E., De Boeck, P., & Van Mechelen, I. (1996). Probability matrix decomposition models. Psychometrika, 61, 7–29.
Masters, G.N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–174.
McCullagh, P., & Neider, J.A. (1989). Generalized linear models (2nd ed.). New York: Chapman and Hall.
Meijer, R.R. (1996). Person-fit research: An introduction. (Guest editor’s introduction to the Special issue: Person-fit research: Theory and applications.) Applied Measurement in Education, 9, 3–8.
Meijer, R.R., Sijtsma, K., & Smid, N.G. (1990). Theoretical and empirical comparison of the Mokken and the Rasch approach to IRT. Applied Psychological Measurement, 14, 283–298.
Mellenbergh, G.J. (1995). Conceptual notes on models for discrete polytomous item responses. Applied Psychological Measurement, 19, 91–100.
Meredith, W. (1965). Some results based on a general stochastic model for mental tests. Psychometrika, 30, 419–440.
Mislevy, R.J. (1985). Estimation of latent group effects. Journal of the American Statistical Association, 80, 993–997.
Mislevy, R.J. (1996). Test theory reconceived. Journal of Educational Measurement, 33, 379–416.
Mislevy, R.J., & Sheehan, K.M. (1989). The role of collateral information about examinees in item parameter estimation Psychometrika, 54, 661–679.
Mokken, R.J. (1971). A theory and procedure of scale analysis: With applications in political research. The Hague: Mouton.
Mokken, R.J. (1997). Nonparametric models for dichotomous items. In W.J. van der Linden & R.K. Hambleton (Eds.), Handbook of modern item response theory (pp. 351–368). New York: Springer-Verlag.
Molenaar, I.W. (1991). A weighted Loevinger H-coefficient extending Mokken scaling to multicategory items. Kwantitatieve Methoden, 37, 97–117.
Molenaar, I.W. (1997). Nonparametric methods for polytomous responses. In W.J. van der Linden & R.K. Hambleton (Eds.), Handbook of modern psychometrics (pp. 369–380). New York: Springer-Verlag.
Molenaar, I.W., & Sijtsma, K. (2000). User’s manual MSP5 for Windows: A program for Mokken scale analysis for polytomous items. Version 5.0 [Software manual]. Groningen: ProGAMMA.
Muraki, E., & Carlson, J.E. (1995). Full-information factor analysis for polytomous item responses. Applied Psychological Measurement, 19, 73–90
Nichols, P., & Sugrue, B. (1999). The lack of fidelity between cognitively complex constructs and conventional test development practice. Educational Measurement: Issues and Practice, 18, 18–29.
Patz, R.J., & Junker, B.W. (1999a). A straightforward approach to Markov chain Monte Carlo methods for item response models. Journal of Educational and Behavioral Statistics, 24, 146–178.
Patz, R.J., & Junker, B.W. (1999b). Applications and extensions of MCMC in IRT: Multiple item types, missing data, and rated responses. Journal of Educational and Behavioral Statistics, 24, 342–366.
Patz, R.J., Junker, B.W., & Johnson, M.S. (2000). The hierarchical rater model for rated test items and its application to large-scale educational assessment data. (CMU Statistics Technical Report #712). Carnegie Mellon University, Department of Statistics, Pittsburgh, PA.
Patz, R.J., Junker, B.W., Lerch, F.J., & Huguenard, B.R. (1996). Analyzing small psychological experiments with item response models (CMU Statistics Department Technical Report #644). Carnegie Mellon University, Department of Statistics, Pittsburgh, PA.
Post, W.J. (1992). Nonparametric unfolding models. A latent structure approach. Leiden: DSWO Press.
Post, W.J., & Snijders, T.A.B. (1993). Nonparametric unfolding models for dichotomous data. Methodika, 7, 130–156.
Ramsay, J.O. (1991). Kernel smoothing approaches to nonparametric item characteristic curve estimation. Psychometrika, 56, 611–630.
Ramsay, J.O. (1995). A similarity-based smoothing approach to nondimensional item analysis. Psychometrika, 60, 323–339.
Ramsay, J.O. (1996). A geometrical approach to item response theory. Behaviormetrika, 23, 3–17.
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: The Danish Institute of Educational Research. (Expanded edition, 1980. Chicago: The University of Chicago Press.)
Reckase, M.D. (1985). The difficulty of test items that measure more than one ability. Applied Psychological Measurement, 9, 401–412.
Resnick, L.B., & Resnick, D.P. (1992). Assessing the thinking curriculum: New tools for educational reform. In B.R. Gifford & M.C. O’Connor (Eds.), Changing assessments: Alternative views of aptitude, achievement, and instruction (pp. 37–75). Norwell, MA: Kluwer.
Rigdon, S.E., & Tsutakawa, R.K. (1983). Parameter estimation in latent trait models. Psychometrika, 48, 567–574.
Robertson, T., Wright, F.T., & Dykstra, R.L. (1988). Order restricted statistical inference. New York: Wiley.
Rosenbaum, P.R. (1984). Testing the conditional independence and monotonicity assumptions of item response theory. Psychometrika, 49, 425–435.
Rosenbaum, P.R. (1987a). Probability inequalities for latent scales. British Journal of Mathematical and Statistical Psychology, 40, 157–168.
Rosenbaum, P.R. (1987b). Comparing item characteristic curves. Psychometrika, 52, 217–233.
Roussos, L. (1994). Summary and review of cognitive diagnosis models. Unpublished manuscript, Law School Admissions Council, Newton, PA.
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph, 17.
Samejima, F. (1995). Acceleration model in the heterogeneous case of the general graded response model. Psychometrika, 60, 549–572.
Samejima, F. (1997). Departure from normal assumptions: A promise for future psychometrics with substantive mathematical modeling. Psychometrika, 62, 471–493.
Scheiblechner, H. (1972). Das Lernen und Lösen komplexer Denkaufgaben [The learning and solving of complex reasoning items]. Zeitschrift für Experimentelle und Angewandte Psychologie, 3, 456–506.
Scheiblechner, H. (1995). Isotonic ordinal probabilistic models (ISOP). Psychometrika, 60, 281–304.
Seltman, H. (1999). Hidden stochastic models for biological rhythm data. Unpublished Ph.D. dissertation, Carnegie Mellon University, Department of Statistics, Pittsburgh, PA.
Shute, V.J., & Psotka, J. (1996). Intelligent tutoring systems: Past, Present and Future. In D. Jonassen (Ed.), Handbook of Research on Educational Communications and Technology (pp. 570–600). New York: Macmillan Press.
Sijtsma, K. (1998). Methodology review: Nonparametric IRT approaches to the analysis of dichotomous item scores. Applied Psychological Measurement, 22, 3–32.
Sijtsma, K., & Hemker, B.T. (1998). Nonparametric polytomous IRT models for invariant item ordering, with results for parametric models. Psychometrika, 63, 183–200.
Sijtsma, K., & Junker, B.W. (1996). A survey of theory and methods of invariant item ordering. British Journal of Mathematical and Statistical Psychology, 49, 79–105.
Sijtsma, K., & Junker, B.W. (1997). Invariant item ordering of transitive reasoning tasks. In J. Rost & R. Langeheine (Eds.), Applications of latent trait and latent class models in the social sciences (pp. 97–107). Münster: Waxmann Verlag.
Sijtsma, K., & Van der Ark, L.A. (2001). Progress in NIRT analysis of polytomous item scores: Dilemmas and practical solutions. In A. Boomsma, M.A.J. van Duijn, & T.A.B. Snijders (Eds.), Essays on item response theory (pp. 297–318). New York: Springer-Verlag.
Snijders, T.A.B. (2001). Two-level nonparametric scaling for dichotomous data. In A. Boomsma, M.A.J. van Duijn, & T.A.B. Snijders (Eds.), Essays on item response theory (pp. 319–338). New York: Springer-Verlag.
Snijders, T.A.B., & Nowicki, K. (1997). Estimation and prediction for stochastic block models for graphs with latent block structure. Journal of Classification, 14 75–100.
Stegelmann, W. (1983). Expanding the Rasch model to a general model having more than one dimension. Psychometrika, 48, 259–267.
Stout, W.F. (1987). A nonparametric approach for assessing latent trait unidimensionality. Psychometrika, 52, 589–617.
Stout, W.F. (1990). A new item response theory modeling approach with applications to unidimensionality assessment and ability estimation. Psychometrika, 55, 293–325.
Stout, W.F., Habing, B., Douglas, J., Kim, H.R., Roussos, L., & Zhang, J. (1996). Conditional covariance-based nonparametric multidimensionality assessment. Applied Psychological Measurement, 20, 331–354.
Tanner, M.A. (1996). Tools for statistical inference: Methods for the exploration of posterior distributions and likelihood functions (3rd ed.). New York: Springer-Verlag.
Tatsuoka, K.K. (1990). Toward an integration of item response theory and cognitive error diagnosis. In N. Fredriksen, R. Glaser, A. Lesgold, & M.G. Shafto (Eds.), Diagnostic monitoring of skill and knowledge acquisition (pp. 453–488). Hillsdale, NJ: Erlbaum.
Ter Hofstede, F., Steenkamp, J.-B.E.M., & Wedel, M. (1999). Identifying spatial segments in international markets. Manuscript submitted for publication.
Thissen, D., & Steinberg, L. (1986). A taxonomy of item response models. Psychometrika, 51, 567–577.
Tutz, G. (1990). Sequential item response models with an ordered response. British Journal of Mathematical and Statistical Psychology, 43, 39–56.
Van der Ark, L.A. (1999, July). A reference card for the relations between IRT models for polytomous items and some relevant properties. Paper presented at the European Meeting of the Psychometric Society in Lüneburg, Germany.
Van der Linden, W.J., & Hambleton, R.K. (Eds.). (1997). Handbook of modern item response theory. New York: Springer-Verlag.
VanLehn, K., & Niu, Z. (in press). Bayesian student modeling, user interfaces and feedback: A sensitivity analysis. International Journal of Artificial Intelligence in Education.
VanLehn, K., Niu, Z., Siler, S., & Gertner, A. (1998). Student modeling from conventional test data: A Bayesian approach without priors. In B.P. Goetl, H.M. Halff, C.L. Redfield, & V.J. Shute (Eds.), Proceedings of the Intelligent Tutoring Systems Fourth International Conference, ITS 98 (pp. 434–443). Berlin: Springer-Verlag.
Verhelst, N.D., & Verstralen, H.H.F.M. (1993). A stochastic unfolding model derived from the partial credit model. Kwantitatieve Methoden, 42, 73–92.
Verhelst, N.D., & Verstralen, H.H.F.M. (2001). An IRT model for multiple raters. In A. Boomsma, M.A.J. van Duijn, & T.A.B. Snijders (Eds.), Essays on item response theory (pp. 89–108). New York: Springer-Verlag.
Wilson, M.R., & Hoskens, M. (1999). The rater bundle model. Paper presented at the Annual Meeting of the American Educational Research Association, Montreal, Canada.
Wilson, D., Wood, R.L., & Gibbons, R. (1983). TESTFACT: Test scoring and item factor analysis [Computer software]. Chicago: Scientific Software.
Wu, M.L., Adams, R.J., & Wilson, M.R. (1997). ConQuest: Generalized item response modeling software [Software manual]. Melbourne: Australian Council for Educational Research.
Yamamoto, K., & Gitomer, D.H. (1993). Application of a HYBRID model to a test of cognitive skill representation. In N. Fredriksen & R.J. Mislevy (Eds.), Test theory for a new generation of tests (pp. 275–295). Hillsdale, NJ: Erlbaum.
Yuan, A., & Clarke, B. (1999). Manifest characterization and testing for two latent traits. Manuscript submitted for publication.
Zhang, J., & Stout, W.F. (1999). The theoretical DETECT index of dimensionality and its application to approximate simple structure. Psychometrika, 64, 213–249.
Zimowski, M.F., Muraki, E., Mislevy, R.J., & Bock, R.D. (1997). BILOGMG: Multiple-group IRT analysis and test maintenance for binary items [Computer software]. Chicago: Scientific Software.
Zwick, R. (1992). The National Assessment of Educational Progress [Special issue]. Journal of Educational Measurement, 17, 93–94.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer Science+Business Media New York
About this chapter
Cite this chapter
Junker, B. (2001). On the Interplay Between Nonparametric and Parametric IRT, with Some Thoughts About the Future. In: Boomsma, A., van Duijn, M.A.J., Snijders, T.A.B. (eds) Essays on Item Response Theory. Lecture Notes in Statistics, vol 157. Springer, New York, NY. https://doi.org/10.1007/978-1-4613-0169-1_14
Download citation
DOI: https://doi.org/10.1007/978-1-4613-0169-1_14
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-95147-8
Online ISBN: 978-1-4613-0169-1
eBook Packages: Springer Book Archive