Chapter 6 : Going Beyond Better Data Prediction to Create Explanatory Models of Educational Data

Across the vast majority of educational data mining research, models are evaluated based on their predictive accuracy. Most often, this takes the form of assessing the model’s ability to correctly predict successes and failures in a set of student response outcomes. Much less commonly, models may be validated based on their ability to predict post-test outcomes (e.g., Corbett & Anderson, 1995) or pre-test/post-test gains (e.g., Liu & Koedinger, 2015).

Cognitive models map knowledge components (i.e., concepts, skills, and facts; Koedinger, Corbett, & student performance can be observed.This mapping provides a way for statistical models to make inferences about students' underlying knowledge based on their observable performance on different problem steps.Thus, cognitive models are an important basis for the instructional design of automated tutors and are important for accurate assessment of learning and knowledge.Better cognitive models lead to better predictions of what a student knows, allowing adaptive of constructing cognitive models (Clark, Feldon, van Merriënboer, Yates, & Early, 2008) include structured interviews, think-aloud protocols, rational analysis, and require human input and are often time consuming.They are also subjective, and previous research (Nathan, Koedinger, & Alibali, 2001;Koedinger & McLaughlin, models often ignore content distinctions that are important for novice learners.Here, we review three models based on data-driven techniques that alleviate For statistical modelling purposes, the work described component (KC) is a fact, concept, or skill required to succeed at a particular task or problem step.We refer statistical model we used to evaluate the predictive logistic regression model called the additive factors learning effects.

Data-Driven Cognitive Model Improvement
a data-driven knowledge decomposition process to other words, when one task is much harder than a closely related task, the difference implies a knowledge demand of the harder task that is not present in the easier one.Stamper and Koedinger (2011) illustrated 1 (Koedinger et al., 2010), to identify and validate cognitive model improvements.The method revised KC model and investigate whether the new 2 ), learning curves with a consistent decline in error rate.One KC in the original model, compose-by-addition, in error rate at certain opportunity counts.In addition, the AFM parameter estimates for the compose-by-addition KC suggested no apparent learning (the slope because the performance was at ceiling).A bumpy learning curve and low slope estimate are indications some knowledge demand that other items do not.In other words, the original KC should really be split into two different KCs.To improve the KC model, all compose-by-addition about additional knowledge that might be required on certain steps.As a result, the compose-by-addition KC was split into three distinct KCs, and each of the 20 steps previously labelled with the compose-by-addition KC were relabelled accordingly.The revised model of student performance than the original KC model did.Although this KC model improvement was aided by the actual improvements were generated manually and thus were readily interpretable.
The discovered KC model improvements had clear implications for revising instruction.Koedinger, Stamper, model improvements to generate a revised version of the newly discovered skills to the KC model driving adaptive learning, resulting in changes to knowledge tracing, and the creation of new tasks to target the completed the revised tutor unit and the other half 1 http:/ /pslcdatashop.org 2 COGNITIVE MODEL DISCOVERY competed the original tutor unit.Students using the KC model improvement, based on pre-to post-test gains (Koedinger et al., 2013).These results show that the outcomes.

Learning Factors Analysis
developed to automate the data-driven method of KC models, evaluates different models based on their the form of a symbolic model.As such, LFA greatly reduces demands on human effort while simultaneously easing the burden of interpretation, even if it does not automatically accomplish it.
We applied the LFA search process across 11 datasets spanning different domains and different educational Across all 11 datasets, this automated discovery process human-tagged KC models (Koedinger, McLaughlin, & Stamper, 2012).Importantly, we demonstrated in an made by the best LFA-discovered model.A manual KC that the LFA model tagged separate KCs for forwards radius given area) circle area problems, whereas these KC in the human-tagged model.No such differences were found between the models for other shapes like rectangles, triangles, and parallelograms.Applying and how to apply a square root operation for backwards circle-area problems, which is not required for forwards circle-area problems nor for the backwards area problems of other shapes.
tation beyond the dataset from which the discoveries were made.We evaluated the presence of the square from that used to make the discovery (Liu, Koedinger, & McLaughlin, 2014).Among other differences, the novel dataset contained more backwards circle-area given area) square-area problems.These square-area problems were not at all present in the original dataset from which the LFA-generated discovery was made.Applying our interpretation of the discovery, we constructed a KC model that tags separate forwards and backwards KCs only for shapes where backwards steps require computing a square root (squares, circles) but not for shapes where backwards steps don't (triangles, rectangles, parallelograms).When used in conjunction with the AFM, this KC model yielded Since the novel dataset had a different structure from the original dataset, including differences relevant to square-area problems), it would not have been viable to apply directly the LFA-discovered KC model on this new dataset.Interpretation is necessary in order to with non-identical structures.Furthermore, interpreanalyses to something meaningful that can then be translated into concrete improvements to instructional this LFA-generated discovery by assessing learning outcomes resulting from a tutor redesigned around the improved KC model (Liu & Koedinger, submitted).

Automated Cognitive Model Discovery Using SimStudent
An alternative automated approach uses a state-of-theart machine-learning agent, SimStudent, to discover cognitive models automatically without requiring inductively learns knowledge, in the form of rules, by observing a tutor solve sample problems and by solving problems on its own and receiving feedback (Li, Mat-SimStudent is that it can simulate features of novices' not even be aware.Real students entering a course knowledge, so a realistic model of human learning ought not to assume this knowledge is given.In addition, SimStudent can be used to test alternative models of human learning to see which best predicts human data better than the best human-generated cognitive The output of the SimStudent's learning takes the form of production rules (Newell & Simon, 1972), and each production rule essentially corresponds to one knowledge component (KC) in a KC model.Using data 4 ) and in conjunction with the AFM, Li and colleagues ( 2011) compared a KC model generated by SimStudent to a KC model generated by hand-coding actual students' actions within the tutor.The SimStuperformance data than the human-generated model did.
More importantly, inspecting the differences between the SimStudent model and the human-generated model such a difference is that SimStudent created distinct production rules (KCs) for division-based algebra simply divide both sides by the signed number A. But, human-generated model predicts that both forms of division problems should have the same error rates.In 0.72).SimStudent's split of division problems into two distinct KCs suggests that students should be tutored on two subsets of problems, one subset corresponding (Li et al., 2011).novel problem types, just as the LFA-generated model discovery did.In a novel equation-solving dataset 5 ), we tested whether applied to combine like terms problems.We looked at

Comparison to Other Work
Both LFA and SimStudent are capable of producing improve predictive accuracy but are readily interprethat the interpretations yielded by these cognitive not present in the data from which the discoveries were made.Finally, they produce clear recommendations very different from those in which the original data modelling efforts that move beyond simply improving predictive accuracy to have meaningful impact on learning theory and instruction.been cited as a limitation.In the case of LFA, one or initially in order to produce new model discoveries.
feature leads the results of such modelling efforts to efforts to fully automate the process of discovering methods have much to recommend, as they dramatically reduce demands on human time and produce competitive results in predictive accuracy.However, the resulting cognitive models of these efforts have not been interpreted or acted upon with respect to improving instruction.Waters, & Baraniuk, 2013), have yielded considerably more interpretable cognitive models than many alterinterpretation of modelling efforts, methods like LFA of generating sensible resulting models by incorporating the human effort up front.In fact, comparing Baraniuk, 2014), which only incorporates concept tags process up front, shows that the latter model results in much more interpretable cognitive models.
More attention and effort towards generating interpretable cognitive models is, in our view, progress in the right direction.Nevertheless, as we have argued, offer much to advance learning theory using the rich educational data available.Human involvement improves interpretability, whereas the data-driven component offers ways to alleviate subjective biases and advance our understanding of how novices learn.Methods such as LFA leverage both the unique strengths of human involvement and of automation towards creating models that are more predictive and A growing body of research suggests that modelling educational data can yield better predictive accuracies group students based on features available in educational datasets have focused on techniques such as K-means and spectral clustering.These techniques have been used to generate student clusters predictive 2011) and that yield predictive accuracy improvements clustering techniques, however, tend to result in interpretation is critical if the results of clustering are to eventually inform improvements in instructional to different groups of students).
In recent research (Liu & Koedinger, 2015), we developed a method for grouping students that not only dramatically improves the predictive accuracy of the AFM but inherently lends itself to producing mean-in the residuals (differences between predicted and actual data) across different practice opportunities, we consistently found students belonging to one of learning curves than the AFM predicts, 2) those who learning curves are on par with the model's predictions.rates to each of these learning rate groups substantially improves model predictive accuracy, beyond that of the regular AFM, across a variety of datasets spanning multiple educational domains.Across datasets, the slope parameter estimates for each of the three groups were consistent with our interpretation of the groups (i.e., the estimated group-level slopes were the steep-curve group).Furthermore, in a subset of data, we observed a systematic relationship between learning-curve group and the degree of pre-to posttest improvement (Liu & Koedinger, 2015).stereotyped groups of students, this method yielded student groups that are readily interpretable and who are already performing at ceiling when they start the unit or curriculum (and thus do not have much room for improvement) or students who are starting anywhere below ceiling but struggling to progress with the material.In either case, there are clear ininterpretation and developing the model with an eye towards interpretability.
We argue for the importance of considering the interpretability and actionability of educational data mining stand why the model achieves better predictive accuracy than alternatives.In addition, the understanding of this why should either advance our understanding of how learners learn the relevant material or have clear implications for instructional improvements, or both.independent variables that have either simple functions

TOWARDS BUILDING EXPLANATORY MODELS
variables using simple split, merge, or add operators. of verbal data in education, a branch of educational data mining that includes automated essay scoring, producing tutorial dialogue, and computer-supported collaborative learning.A major consideration in this into features that can be used in a machine-learning algorithm.Approaches to this issue range from simple phisticated linguistic analyses.One consistent theme by interpretable, theoretical frameworks have been among the most promising (Rosé & Tovares, in press; these independent variables up front can greatly immost to actionability, is that the dependent variable This makes the results from modelling readily actionable.Another body of research in which the dependent variable tends to be well mapped to an interpretable construct is the modelling of affect and motivation using features of tutor log data.These techniques can identify those constructs within tutor log data constructs and, thus, the results of these algorithms -Tutor is an intelligent tutoring system for computer literacy that automatically models students' confusion, these affective states is then used to adapt the tutor actions in a manner that responds accordingly.An detector showed higher learning gains for low-domain knowledge students who interacted with the Affective AutoTutor compared to a non-affective version independent variables driving the affective outcomes are also needed.

-4
Improving skill at solving equations via better encoding of alge-5 -(average error rate = 0.45) among combine like terms problems.This new dataset not only replicated combine like terms.combine like terms items to a KC model with a single combine like terms KC.Furthermore, although the learning curves for both divide and combine like terms SimStudent KC model discovery made it possible to on which SimStudent was never trained.
by fewer estimated parameters (independent variables, rameter for each student and two parameters for each knowledge component.Adding learning rate groups group membership.This makes the contribution of the added parameter easy to attribute and interpret.Having fewer parameters also allows each parameter's issues of indeterminacy.Because the AFM has only one low learning parameter estimate as suggesting that KC We have illustrated some ways in which concrete steps in the design of educational data modelling efforts learning theory, and the practice of education could be greatly strengthened with increased attention to Liu, R., & Koedinger, K. R. (submitted).Closing the loop: Automated data-driven skill model discoveries lead to improved instruction and learning gains.Journal of Educational Data Mining.--(Eds.),Proceedings of the 8 th International Conference on Education Data Mining Proceedings of the 7 th International Conference on Educational Data Mining the loop between learning theory and educational data.In T. Barnes, M. Chi, & M. Feng (Eds.),Proceedings of the 9 th International Conference on Educational Data Mining pedagogical content knowledge.In L. Chen et al. (Eds.),Proceedings of the 3 rd International Conference on Cognitive Science -Newell, A., & Simon, H. A. (1972).Human problem solving Proceedings of the 11 th International Conference on Intelligent Tutoring Systems about interaction analysis.In L. Resnick, C. Asterhan, & S. Clarke (Eds.),Socializing intelligence through academic talk and dialogue tion of tutoring goals.(Eds.),Proceedings of the 6 th International Conference on Educational Data Mining Statistical Science, 25 Proceedings of the 15 th standard test score predictions.Proceedings of the 15 th Education tion, 16 tion and self-regulated learning.Journal of Educational Data Mining, 5