Handbook of Learning Analytics

Chapter 3

Handbook of Learning Analytics
First Edition

Measurement and its Uses in Learning Analytics

Yoav Bergner


Psychological measurement is a process for making warranted claims about states of mind. As such, it typically comprises the following: defining a construct; specifying a measurement model and (developing) a reliable instrument; analyzing and accounting for various sources of error (including operator error); and framing a valid argument for particular uses of the outcome. Measurement of latent variables is, after all, a noisy endeavor that can nevertheless have high-stakes consequences for individuals and groups. This chapter is intended to serve as an introduction to educational and psychological measurement for practitioners in learning analytics and educational data mining. It is organized thematically rather than historically, from more conceptual material about constructs, instruments, and sources of measurement error toward increasing technical detail about particular measurement models and their uses. Some of the philosophical differences between explanatory and predictive modelling are explored toward the end.

Export Citation: Plain Text (APA)     BIBTeX     RIS

Supplementary Material

No Supplementary Material Available

References (42)

AERA, APA, & NCME (American Educational Research Association, American Psychological Association, & National Council on Measurement in Education). (2014). Standards for educational and psychological testing. Washington, DC: AERA.

Ames, A. J., & Penfield, R. D. (2015). An NCME instructional module on polytomous item response theory models. Educational Measurement: Issues and Practice, 34(3), 39–48. doi:10.1111/emip.12023

Anderson, J. R., Corbett, A. T., Koedinger, K. R., & Pelletier, R. (1995). Cognitive tutors: Lessons learned. The Journal of the Learning Sciences, 4(2), 167–207.

Armstrong, J. S. (1967). Derivation of theory by means of factor analysis or Tom Swift and his electric factor analysis machine. The American Statistician, 21, 17–21.

Attali, Y. (2011). Immediate feedback and opportunity to revise answers: Application of a graded response IRT model. Applied Psychological Measurement, 35(6), 472–479.

Baker, F. B., & Kim, S.-H. (Eds.). (2004). Item response theory: Parameter estimation techniques. Boca Raton, FL: CRC Press.

Baker, R. S., & Siemens, G. (2014). Educational data mining and learning analytics. In R. Sawyer (Ed), The Cambridge handbook of the learning sciences (pp. 253–272). Cambridge University Press.

Baker, R. S., & Yacef, K. (2009). The state of educational data mining in 2009: A review and future visions. Journal of Educational Data Mining, 1(1), 3–17.

Barnes, T. (2005). The Q-matrix method: Mining student response data for knowledge. In the Technical Report (WS-05-02) of the AAAI-05 Workshop on Educational Data Mining.

Behrens, J. T., & DiCerbo, K. E. (2014). Harnessing the currents of the digital ocean. In J. A. Larusson & B. White (Eds.), Learning analytics: From research to practice (pp. 39–60). New York: Springer.

Bachman, J. G., & O’Malley, P.M. (1984). Yea-saying, nay-saying, and going to extremes: Black-white differences in response styles. Public Opinion Quarterly, 48, 491–509.

Bergner, Y., Colvin, K., & Pritchard, D. E. (2015). Estimation of ability from homework items when there are missing and/or multiple attempts. Proceedings of the 5th International Conference on Learning Analytics and Knowledge (LAK ʼ15), 16–20 March 2015, Poughkeepsie, NY, USA (pp. 118–125). New York: ACM.

Bergner, Y., Kerr, D., & Pritchard, D. E. (2015). Methodological challenges in the analysis of MOOC data for exploring the relationship between discussion forum views and learning outcomes. In O. C. Santos et al. (Eds.), Proceedings of the 8th International Conference on Educational Data Mining (EDM2015), 26–29 June 2015, Madrid, Spain (pp. 234–241). International Educational Data Mining Society.

Bergner, Y., Rayyan, S., Seaton, D., & Pritchard, D. E. (2013). Multidimensional student skills with collaborative filtering. AIP Conference Proceedings, 1513(1), 74–77. doi:10.1063/1.4789655

Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3(Jan.), 993–1022.

Bollen, K. A. (1989). Structural equations with latent variables. John Wiley & Sons.

Borsboom, D. (2008). Latent variable theory. Measurement: Interdisciplinary Research & Perspective, 6(1–2), 25–53. http://doi.org/10.1080/15366360802035497

Box, G. E. (1979). Robustness in the strategy of scientific model building. Robustness in Statistics, 1, 201–236.

Breiman, L. (2001). Statistical modeling: The two cultures. Statistical Science, 16(3), 199–215. http://doi.org/10.2307/2676681

Brennan, R. L. (Ed.). (2006). Educational measurement. Praeger Publishers.

Bridgman, P. W. (1927). The logic of modern physics. New York: Macmillan.

Buckingham Shum, S., & Deakin Crick, R. (2012). Learning dispositions and transferable competencies: Pedagogy, modeling and learning analytics. Proceedings of the 2nd International Conference on Learning Analytics and Knowledge (LAK ʼ12), 29 April–2 May 2012, Vancouver, BC, Canada (pp. 92–101). New York: ACM.

Cardamone, C. N., Abbott, J. E., Rayyan, S., Seaton, D. T., Pawl, A., & Pritchard, D. E. (2011). Item response theory analysis of the mechanics baseline test. Proceedings of the 2011 Physics Education Research Conference (PERC 2011), 3–4 August 2011, Omaha, NE, USA (pp. 135–138). doi:10.1063/1.3680012

Cen, H., Koedinger, K. R., & Junker, B. (2008). Comparing two IRT models for conjunctive skills. In B. Woolf, E. Aïmeur, R. Nkambou, & S. Lajoie (Eds.), Proceedings of the 9th International Conference on Intelligent Tutoring Systems (ITS 2008), 23–27 June 2008, Montreal, PQ, Canada (pp. 796–798). Springer.

Cohen, J. (1968). Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Psychological Bulletin, 70(4), 213–220.

Corbett, A. T., & Anderson, J. R. (1995). Knowledge tracing: Modeling the acquisition of procedural knowledge. User Modeling and User-Adapted Interaction, 4, 253–278.

Cortina, J. M. (1993). What is coefficient alpha? An examination of theory and applications. Journal of Applied Psychology, 78(1), 98.

Crick, R. D., Broadfoot, P., & Claxton, G. (2004). Developing an effective lifelong learning inventory: The ELLI project. Assessment in Education: Principles, Policy & Practice, 11(3), 247–272.

Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297–334.

Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52(4), 281–302.

Culpepper, S. A. (2014). If at first you don’t succeed, try, try again: Applications of sequential IRT models to cognitive assessments. Applied Psychological Measurement, 38(8), 632–644. doi:10.1177/0146621614536464

Deci, E. L., & Ryan, R. M. (1985). Intrinsic motivation and self-determination in human behaviour. New York: Plenum.

Dedic, H., Rosenfield, S., & Lasry, N. (2010). Are all wrong FCI answers equivalent? AIP Conference Proceedings, 1289, 125–128. doi.org/10.1063/1.3515177

Desmarais, M. C. (2012). Mapping question items to skills with non-negative matrix factorization. ACM SIGKDD Explorations Newsletter, 13(2), 30–36.

Desmarais, M. C., & Baker, R. S. (2011). A review of recent advances in learner and skill modeling in intelligent learning environments. User Modeling and User-Adapted Interaction, 22(1–2), 9–38. doi:10.1007/s11257-011-9106-8

DeVellis, R. F. (2003). Scale development: Theory and applications. Applied Social Research Methods Series (Vol. 26). Thousand Oaks, CA: Sage Publications.

Digman, J. M. (1990). Personality structure: Emergence of the five-factor model. Annual Review of Psychology, 41(1), 417–440.

Ding, L., & Beichner, R. (2009). Approaches to data analysis of multiple-choice questions. Physical Review Special Topics: Physics Education Research, 5(2), 1–17. doi:10.1103/PhysRevSTPER.5.020103

Draney, K., Pirolli, P., & Wilson, M. R. (1995). A measurement model for a complex cognitive skill. In P. Nichols, S. Chipman, & R. Brennan (Eds.), Cognitively diagnostic assessment. Hillsdale, NJ: Lawrence Erlbaum Associates.

Duckworth, A. L., Peterson, C., Matthews, M. D., & Kelly, D. R. (2007). Grit: Perseverance and passion for long-term goals. Journal of Personality and Social Psychology, 9, 1087–1101.

Dweck, C. S. (2000). Self-theories: Their role in motivation, personality and development. Philadelphia, PA: Taylor & Francis.

Edwards, J. R. (2001). Multidimensional constructs in organizational behavior research: An integrative analytical framework. Organizational Research Methods, 4(2), 144–192.

Erosheva, E., Fienberg, S., & Lafferty, J. (2004). Mixed-membership models of scientific publications. Proceedings of the National Academy of Sciences, 101(suppl 1), 5220–5227.

Fabrigar, L. R., Wegener, D. T., MacCallum, R. C., & Strahan, E. J. (1999). Evaluating the use of exploratory factor analysis in psychological research. Psychological Methods, 4(3), 272.

Fischer, G. H. (1973). The linear logistic test model as an instrument in educational research. Acta Psychologica, 37(6), 359–374.

Fraley, C., & Raftery, A. E. (1998). How many clusters? Which clustering method? Answers via model-based cluster analysis. The Computer Journal, 41(8), 578–588.

George, R. (2000). Measuring change in students’ attitudes toward science over time: An application of latent variable growth modeling. Journal of Science Education and Technology, 9(3), 213–225.

Goodman, L. (2002) Latent class analysis: The empirical study of latent types, latent variables, and latent structures. In J. A. Hagenaars & A. L. McCutcheon (Eds.), Applied latent class analysis (pp. 3–55). Cambridge, UK: Cambridge University Press.

Guay, F., Vallerand, R. J., & Blanchard, C. (2000). On the assessment of situational intrinsic and extrinsic motivation: The situational motivation scale (SIMS). Motivation and Emotion, 24(3), 175–213.

Haberman, S. J. (2009). Use of generalized residuals to examine goodness of fit of item response models. ETS Research Report RR-09-15.

Hagerty, M. R., & Srinivasan, V. (1991). Comparing the predictive powers of alternative multiple regression models. Psychometrika, 56(1), 77–85.

Hestenes, D., & Wells, M. (1992). A mechanics baseline test. The Physics Teacher, 30(3), 159–166.

Hestenes, D., Wells, M., & Swackhamer, G. (1992). Force concept inventory. The Physics Teacher, 30(3), 141. doi:10.1119/1.2343497

Holland, P. W. (1990). On the sampling theory roundations of item response theory models. Psychometrika, 55(4), 577–601. http://doi.org/10.1007/BF02294609

Kane, M. T. (2001). Current concerns in validity theory. Journal of Educational Measurement, 38(4), 319–342.

Kane, M. (2010). Errors of measurement, theory, and public policy. William H. Angoff Memorial Lecture Series. Educational Testing Service. https://www.ets.org/Media/Research/pdf/PICANG12.pdf

Käser, T., Koedinger, K. R., & Gross, M. (2014). Different parameters — same prediction: An analysis of learning curves. In S. K. DʼMello, R. A. Calvo, & A. Olney (Eds.), Proceedings of the 6th International Conference on Educational Data Mining (EDM2013), 6–9 July 2013, Memphis, TN, USA (pp. 52–59). International Educational Data Mining Society/Springer.

Khajah, M., Lindsey, R. V., & Mozer, M. C. (2016). How deep is knowledge tracing? In T. Barnes, M. Chi, & M. Feng (Eds.), Proceedings of the 9th International Conference on Educational Data Mining (EDM2016), 29 June–2 July 2016, Raleigh, NC, USA (pp. 94–101). International Educational Data Mining Society.

Kline, R. B. (2010). Principles and practice of structural equation modeling. New York: Guilford.

Koedinger, K. R., McLaughlin, E. A., & Stamper, J. (2012). Automated student model improvement. In K. Yacef et al. (Eds.), Proceedings of the 5th International Conference on Educational Data Mining (EDM2012), 19–21 June 2012, Chania, Greece. International Educational Data Mining Society. http://www.learnlab.org/research/wiki/images/e/e1/KoedingerMcLaughlinStamperEDM12.pdf

Lord, F. M. (1980). Applications of item response theory to practical testing problems. Routledge.

Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Addison-Wesley.

Luria, R. E. (1975). The validity and reliability of the visual analogue mood scale. Journal of Psychiatric Research, 12(1), 51–57.

Martin, B., Mitrovic, T., Mathan, S., & Koedinger, K. R. (2010). Evaluating and improving adaptive educational systems with learning curves. User Modeling and User-Adapted Interaction: The Journal of Personalization Research, 21, 249–283.

Maul, A., Irribarra, D. T., & Wilson, M. (2016). On the philosophical foundations of psychological measurement. Measurement, 79, 311–320. http://doi.org/10.1016/j.measurement.2015.11.001

Mazur, E. (2007). Confessions of a converted lecturer. https://www.math.upenn.edu/~pemantle/active-papers/Mazurpubs_605.pdf

McLachlan, G., & Peel, D. (2004). Finite mixture models. John Wiley & Sons.

Meredith, W., & Tisak, J. (1990). Latent curve analysis. Psychometrika, 55(1), 107–122.

Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50(9), 741–749.

Messick, S., & Jackson, D. (1961). Acquiescence and the factorial interpretation of the MMPI. Psychological Bulletin, 58(4), 299–304

Michell, J. (1999). Measurement in psychology: A critical history of a methodological concept (Vol. 53). Cambridge University Press.

Midgley, C., Maehr, M. L., Hruda, L., Anderman, E. M., Anderman, L., Freeman, K. E., et al. (2000). Manual for the patterns of adaptive learning scales (PALS). Ann Arbor, MI: University of Michigan.

Milligan, S. K., & Griffin, P. (2016). Understanding learning and learning design in MOOCs: A measurement-based interpretation. Journal of Learning Analytics, 3(2), 88–115.

Millsap, R. E. (2012). Statistical approaches to measurement invariance. Routledge.

Mislevy, R. J. (2009). Validity from the perspective of model-based reasoning. In R. L. Lissitz (Ed.), The concept of validity: Revisions, new directions and applications (pp. 83–108). Charlotte, NC: Information Age Publishing.

Mislevy, R. J. (2012). Four metaphors we need to understand assessment. Draft paper commissioned by the Gordon Commission. http://www.gordoncommission.com/rsc/pdfs/mislevy_four_metaphors_understand_assessment.pdf

Morris, G. A., Branum-Martin, L., Harshman, N., Baker, S. D., Mazur, E., Dutta, S., … McCauley, V. (2006). Testing the test: Item response curves and test quality. American Journal of Physics, 74(5), 449. doi:10.1119/1.2174053

Mulaik, S. A. (2009). Foundations of factor analysis. Boca Raton, FL: CRC Press.

Nederhof, A. J. (1985). Methods of coping with social desirability bias: A review. European Journal of Social Psychology, 15(3), 263–280. http://doi.org/10.1002/ejsp.2420150303

Newell, A., & Rosenbloom, P. S. (1981). Mechanisms of skill acquisition and the law of practice. Cognitive Skills and their Acquisition, 6, 1–55.

Pekrun, R., Goetz, T., Frenzel, A. C., Barchfeld, P., & Perry, R. P. (2011). Measuring emotions in students’ learning and performance: The achievement emotions questionnaire (AEQ). Contemporary Educational Psychology, 36(1), 36–48. http://doi.org/10.1016/j.cedpsych.2010.10.002

Pintrich, P. R., & De Groot, E. V. (1990). Motivational and self-regulated learning components of classroom academic performance. Journal of Educational Psychology, 82(1), 33.

Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.

Rao, C. R., & Sinharay, S. (Eds.). (2006). Handbook of statistics 26: Psychometrics. Elsevier. doi:10.1016/S0169-7161(06)26037-1

Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods (Vol. 1). Sage.

Rijmen, F. (2010). Formal relations and an empirical comparison among the bi-factor, the testlet, and a second-order multidimensional IRT model. Journal of Educational Measurement, 47(3), 361–372. doi:10.1111/j.1745-3984.2010.00118.x

Rupp, A., & Templin, J. L. (2008). Unique characteristics of diagnostic classification models: A comprehensive review of the current state-of-the-art. Measurement: Interdisciplinary Research & Perspective, 6(4), 219–262. doi:10.1080/15366360802490866

Schwartz, S. (2007). The structure of identity consolidation: Multiple correlated constructs or one superordinate construct? Identity, 7(1), 27–49.

Scott, T. F., Schumayer, D., & Gray, A. R. (2012). Exploratory factor analysis of a force concept inventory data set. Physical Review Special Topics: Physics Education Research, 8(2). doi:10.1103/PhysRevSTPER.8.020105

Shmueli, G. (2010). To explain or to predict? Statistical Science, 25(3), 289–310. http://doi.org/10.1214/10-STS330

Siemens, G., & Baker, R. S. (2012). Learning analytics and educational data mining: Towards communication and collaboration. Proceedings of the 2nd International Conference on Learning Analytics and Knowledge (LAK ʼ12), 29 April–2 May 2012, Vancouver, BC, Canada (pp. 252–254). New York: ACM.

Sijtsma, K. (2011). Introduction to the measurement of psychological attributes. Measurement, 44(7), 1209–1219. doi:10.1016/j.measurement.2011.03.019

Sijtsma, K. (1998). Methodology review: Nonparametric IRT approaches to the analysis of dichotomous item scores. Applied Psychological Measurement, 22(1), 3–31. doi:10.1177/01466216980221001

Skrondal, A., & Rabe-Hesketh, S. (2004). Generalized latent variable modeling: Multilevel, longitudinal and structural equation models. Boca Raton, FL: Chapman & Hall/CRC Press.

Spearman, C. (1904). “General intelligence,” objectively determined and measured. The American Journal of Psychology, 15(2), 201–292.

Spray, J. A. (1997). Multiple-attempt, single-item response models. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 209–220). New York: Springer.

Stevens, S. S. (1946). On the theory of scales of measurement. Science, 103(2684), 677–680.

Suthers, D., & Verbert, K. (2013). Learning analytics as a middle space. Proceedings of the 3rd International Conference on Learning Analytics and Knowledge (LAK ’13), 8–12 April 2013, Leuven, Belgium (pp. 1–4). New York: ACM.

Tatsuoka, K. K. (1983). Rule space: An approach for dealing with misconceptions based on item response theory. Journal of Educational Measurement, 20, 345–354.

Tempelaar, D. T., Niculescu, A., Rienties, B., Giesbers, B., & Gijselaers, W. H. (2012). How achievement emotions impact students’ decisions for online learning, and what precedes those emotions. Internet and Higher Education, 15(3), 161–169. doi:10.1016/j.iheduc.2011.10.003

Tempelaar, D. T., Rienties, B., & Giesbers, B. (2015). In search for the most informative data for feedback generation: Learning analytics in a data-rich context. Computers in Human Behavior, 47, 157–167. doi:10.1016/j.chb.2014.05.038

Thurstone, L. L. (1947). Multiple factor analysis. Chicago, IL: University of Chicago Press.

van de Sande, B. (2013). Properties of the Bayesian knowledge tracing model. Journal of Educational Data Mining, 5(2), 1–10.

von Davier, M. (2005). A general diagnostic model applied to language testing data. The British Journal of Mathematical and Statistical Psychology, 61(Pt 2), 287–307. doi:10.1348/000711007X193957

Wang, Y., & Baker, R. S. (2015). Content or platform: Why do students complete MOOCs? Journal of Online Learning and Teaching, 11(1), 17.

Wang, J., & Bao, L. (2010). Analyzing force concept inventory with item response theory. American Journal of Physics, 78(10), 1064. doi:10.1119/1.3443565

White, H. (1996). Estimation, inference and specification analysis (No. 22). Cambridge University Press.

Wise, S., & Kong, X. (2005). Response time effort: A new measure of examinee motivation in computer-based tests. Applied Measurement in Education, 18(2), 163–183.

Yeager, D. S., & Dweck, C. S. (2012). Mindsets that promote resilience: When students believe that personal characteristics can be developed. Educational Psychologist, 47(4), 302-314.

About this Chapter

Measurement and its Uses in Learning Analytics

Book Title
Handbook of Learning Analytics

pp. 35-48




Society for Learning Analytics Research

Yoav Bergner

Author Affiliations
Learning Analytics Research Network, New York University, USA

Charles Lang1
George Siemens2
Alyssa Wise3
Dragan Gašević4

Editor Affiliations
1. Teachers College, Columbia University, USA
2. LINK Research Lab, University of Texas at Arlington, USA
3. Learning Analytics Research Network, New York University, USA
4. Schools of Education and Informatics, University of Edinburgh, UK


Register | Lost Password