LAK2023: LAK23: 13th International Learning Analytics and Knowledge Conference

Full Citation in the ACM Digital Library

SESSION: Full Research Papers

Predicting Item Response Theory Parameters Using Question Statements Texts

Recently, new Neural Language Models pre-trained on a massive corpus of texts are available. These models encode statistical features of the languages through their parameters, creating better word vector representations that allow the training of neural networks with smaller sample sets. In this context, we investigate the application of these models to predict Item Response Theory parameters in multiple choice questions. More specifically, we apply our models for the Brazilian National High School Exam (ENEM) questions using the text of their statements and propose a novel optimization target for regression: Item Characteristic Curve. The architecture employed could predict the difficulty parameter b of the ENEM 2020 and 2021 items with a mean absolute error of 70 points. Calculating the IRT score in each knowledge area of the exam for a sample of 100,000 students, we obtained a mean absolute below 40 points for all knowledge areas. Considering only the top quartile, the exam’s main target of interest, the average error was less than 30 points for all areas, being the majority lower than 15 points. Such performance allows predicting parameters on newly created questions, composing mock tests for student training, and analyzing their performance with excellent precision, dispensing with the need for costly item calibration pre-test step.

Logs or Self-Reports? Misalignment Between Behavioral Trace Data and Surveys When Modeling Learner Achievement Goal Orientation

While learning analytics researchers have been diligently integrating trace log data into their studies, learners’ achievement goals are still predominantly measured by self-reported surveys. This study investigated the properties of trace data and survey data as representations of achievement goals. Through the lens of goal complex theory, we generated achievement goal clusters using latent variable mixture modeling applied to each kind of data. Findings show significant misalignment between these two data sources. Self-reported goals stated before learning do not translate into goal-relevant behaviors tracked using trace data collected during learning activities. While learners generally articulate an orientation towards mastery learning in self-report surveys, behavioral trace data showed a higher incidence of less engaged learning activities. These findings call into question the utility of survey-based measures when up-to-date achievement goal data are needed. Our results advance methodological and theoretical understandings of achievement goals in the modern age of learning analytics.

SeNA: Modelling Socio-spatial Analytics on Homophily by Integrating Social and Epistemic Network Analysis

Homophily is a fundamental sociological theory that describes the tendency of individuals to interact with others who share similar attributes. This theory has shown evident relevance for studying collaborative learning and classroom orchestration in learning analytics research from a social constructivist perspective. Emerging advancements in multimodal learning analytics have shown promising results in capturing interaction data and generating socio-spatial analytics in physical learning spaces through computer vision and wearable positioning technologies. Yet, there are limited ways for analysing homophily (e.g., social network analysis; SNA), especially for unpacking the temporal connections between different homophilic behaviours. This paper presents a novel analytic approach, Social-epistemic Network Analysis (SeNA), for analysing homophily by combining social network analysis with epistemic network analysis to infuse socio-spatial analytics with temporal insights. The additional insights SeNA may offer over traditional approaches (e.g., SNA) were illustrated through analysing the homophily of 98 students in open learning spaces. The findings showed that SeNA could reveal significant behavioural differences in homophily between comparison groups across different learning designs, which were not accessible to SNA alone. The implications and limitations of SeNA in supporting future learning analytics research regarding homophily in physical learning spaces are also discussed.

TikTok as Learning Analytics Data: Framing Climate Change and Data Practices

Climate change has far-reaching impacts on communities around the world. However, climate change education has more often focused on scientific facts and statistics at a global scale than experiences at personal and local scales. To understand how to frame climate change education, I turn to youth-created videos on TikTok—a video-sharing, social media platform. Semantic network analysis of hashtags related to climate change reveals multifaceted, intertwining discourse around awareness of climate change consequences, call for action to reduce human impacts on natural systems, and environmental activism. I further explore how youth integrate personal, lived experiences data into climate change discussions. A higher usage of second-person perspective ("you"; i.e., addressing the audience), prosocial and agency words, and negative messaging tone are associated with higher odds of a video integrating lived experiences. These findings illustrate the platform’s affordances: In communicating to a broad audience, youth take on agency and pro-social stances and express emotions to relate to viewers and situate their content. Findings suggest the utility of learning analytics to explore youth’s perspectives and provide insights to frame climate change education in ways that elevate lived experiences.

Learning analytics dashboards: What do students actually ask for?

Learning analytics (LA) has been opening new opportunities to support learning in higher education (HE). LA dashboards are an important tool in providing students with insights into their learning progress, and predictions, leading to reflection and adaptation of learning plans and habits. Based on a human-centered approach, we present a perspective of students, as essential stakeholders, on LA dashboards. We describe a longitudinal study, based on survey methodology. The study included two iterations of a survey, conducted with second-year ICT students in 2017 (N = 222) and 2022 (N = 196). The study provided insights into the LA dashboard features the students find the most useful to support their learning. The students highly appreciated features related to short-term planning and organization of learning, while they were cautious about comparison and competition with other students, finding such features possibly demotivating. We compared the 2017 and 2022 results to establish possible changes in the students’ perspectives with the COVID-19 pandemic. The students’ awareness of the benefits of LA has increased, which may be related to the strong focus on online learning during the pandemic. Finally, a factor analysis yielded a dashboard model with five underlying factors: comparison, planning, predictions, extracurricular, and teachers.

"That Student Should be a Lion Tamer!" StressViz: Designing a Stress Analytics Dashboard for Teachers

In recent years, there has been a growing interest in creating multimodal learning analytics (LA) systems that automatically analyse students’ states that are hard to see with the "naked eye", such as cognitive load and stress levels, but that can considerably shape their learning experience. A rich body of research has focused on detecting such aspects by capturing bodily signals from students using wearables and computer vision. Yet, little work has aimed at designing end-user interfaces that visualise physiological data to support tasks deliberately designed for students to learn from stressful situations. This paper addresses this gap by designing a stress analytics dashboard that encodes students’ physiological data into stress levels during different phases of an authentic team simulation in the context of nursing education. We conducted a qualitative study with teachers to understand (i) how they made sense of the stress analytics dashboard; (ii) the extent to which they trusted the dashboard in relation to students’ cortisol data; and (iii) the potential adoption of this tool to communicate insights and aid teaching practices.

Predictive Learning Analytics and University Teachers: Usage and perceptions three years post implementation

Predictive learning analytics (PLA) dashboards have been used by teachers to identify students at risk of failing their studies and provide proactive support. Yet, very few of them have been deployed at a large scale or had their use studied at a mature level of implementation. In this study, we surveyed 366 distance learning university teachers across four faculties three years after PLA has been made available across university as business as usual. Informed by the Unified Theory of Acceptance and Use of Technology (UTAUT), we present a context-specific version of UTAUT that reflects teachers’ perceptions of PLA in distance learning higher education. The adoption and use of PLA was shown to be positively influenced by less experience in teaching, performance expectancy, self-efficacy, positive attitudes, and low anxiety, while negatively influenced by a lack of facilitating conditions and low effort expectancy, indicating that the type of technology and context within which it is used are significant factors determining our understanding of technology usage and adoption. This study provides significant insights as to how to design, apply and implement PLA with teachers in higher education.

Each Encounter Counts: Modeling Language Learning and Forgetting

Language learning applications usually estimate the learner’s language knowledge over time to provide personalized practice content for each learner at the optimal timing. However, accurately predicting language knowledge or linguistic skills is much more challenging than math or science knowledge, as many language tasks involve memorization and retrieval. Learners must memorize a large number of words and meanings, which are prone to be forgotten without practice. Although a few studies consider forgetting when modeling learners’ language knowledge, they tend to apply traditional models, consider only partial information about forgetting, and ignore linguistic features that may significantly influence learning and forgetting. This paper focuses on modeling and predicting learners’ knowledge by considering their forgetting behavior and linguistic features in language learning. Specifically, we first explore the existence of forgetting behavior and cross-effects in real-world language learning datasets through empirical studies. Based on these, we propose a model for predicting the probability of recalling a word given a learner’s practice history. The model incorporates key information related to forgetting, question formats, and semantic similarities between words using the attention mechanism. Experiments on two real-world datasets show that the proposed model improves performance compared to baselines. Moreover, the results indicate that combining multiple types of forgetting information and item format improves performance. In addition, we find that incorporating semantic features, such as word embeddings, to model similarities between words in a learner’s practice history and their effects on memory also improves the model.

How Do Teachers Use Dashboards Enhanced with Data Storytelling Elements According to their Data Visualisation Literacy Skills?

There is a proliferation of learning analytics (LA) dashboards aimed at supporting teachers. Yet, teachers still find it challenging to make sense of LA dashboards, thereby making informed decisions. Two main strategies to address this are emerging: i) upskilling teachers’ data literacy; ii) improving the explanatory design features of current dashboards (e.g., adding visual cues or text) to minimise the skills required by teachers to effectively use dashboards. While each approach has its own trade-offs, no previous work has explored the interplay between the dashboard design and such "data skills". In this paper, we explore how teachers with varying visualisation literacy (VL) skills use LA dashboards enhanced with (explanatory) data storytelling elements. We conducted a quasi-experimental study with 23 teachers of varied VL inspecting two versions of an authentic multichannel dashboard enhanced with data storytelling elements. We used an eye-tracking device while teachers inspected the students’ data captured from Zoom and Google Docs, followed by interviews. Results suggest that high VL teachers adopted complex exploratory strategies and were more sensitive to subtle inconsistencies in the design; while low VL teachers benefited the most from more explicit data storytelling guidance such as accompanying complex graphs with narrative and semantic colour encoding.

Learner-centred Analytics of Feedback Content in Higher Education

Feedback is an effective way to assist students in achieving learning goals. The conceptualisation of feedback is gradually moving from feedback as information to feedback as a learner-centred process. To demonstrate feedback effectiveness, feedback as a learner-centred process should be designed to provide quality feedback content and promote student learning outcomes on the subsequent task. However, it remains unclear how instructors adopt the learner-centred feedback framework for feedback provision in the teaching practice. Thus, our study made use of a comprehensive learner-centred feedback framework to analyse feedback content and identify the characteristics of feedback content among student groups with different performance changes. Specifically, we collected the instructors’ feedback on two consecutive assignments offered by an introductory to data science course at the postgraduate level. On the basis of the first assignment, we used the status of student grade changes (i.e., students whose performance increased and those whose performance did not increase on the second assignment) as the proxy of the student learning outcomes. Then, we engineered and extracted features from the feedback content on the first assignment using a learner-centred feedback framework and further examined the differences of these features between different groups of student learning outcomes. Lastly, we used the features to predict student learning outcomes by using widely-used machine learning models and provided the interpretation of predicted results by using the SHapley Additive exPlanations (SHAP) framework. We found that 1) most features from the feedback content presented significant differences between the groups of student learning outcomes, 2) the gradient boost tree model could effectively predict student learning outcomes, and 3) SHAP could transparently interpret the feature importance on predictions.

How to Build More Generalizable Models for Collaboration Quality? Lessons Learned from Exploring Multi-Context Audio-Log Datasets using Multimodal Learning Analytics

Multimodal learning analytics (MMLA) research for building collaboration quality estimation models has shown significant progress. However, the generalizability of such models is seldom addressed. In this paper, we address this gap by systematically evaluating the across-context generalizability of collaboration quality models developed using a typical MMLA pipeline. This paper further presents a methodology to explore modelling pipelines with different configurations to improve the generalizability of the model. We collected 11 multimodal datasets (audio and log data) from face-to-face collaborative learning activities in six different classrooms with five different subject teachers. Our results showed that the models developed using the often-employed MMLA pipeline degraded in terms of Kappa from Fair (.20 < Kappa < .40) to Poor (Kappa < .20) when evaluated across contexts. This degradation in performance was significantly ameliorated with pipelines that emerged as high-performing from our exploration of 32 pipelines. Furthermore, our exploration of pipelines provided statistical evidence that often-overlooked contextual data features improve the generalizability of a collaboration quality model. With these findings, we make recommendations for the modelling pipeline which can potentially help other researchers in achieving better generalizability in their collaboration quality estimation models.

Empowering Teacher Learning with AI: Automated Evaluation of Teacher Attention to Student Ideas during Argumentation-focused Discussion

Engaging students in argument from evidence is an essential goal of science education. This is a complex skill to develop; recent research in science education proposed the use of simulated classrooms to facilitate the practice of the skill. We use data from one such simulated environment to explore whether automated analysis of the transcripts of the teacher’s interaction with the simulated students using Natural Language Processing techniques could yield an accurate evaluation of the teacher’s performance. We are especially interested in explainable models that could also support formative feedback. The results are encouraging: Not only can the models score the transcript as well as humans can, but they can also provide justifications for the scores comparable to those provided by human raters.

Are They Learning or Guessing? Investigating Trial-and-Error Behavior with Limited Test Attempts

Mastery learning and deliberate practice promote personalized learning, allowing the learner to improve through a repetitive and targeted approach. Unfortunately, this pedagogy is challenging to implement in classrooms where everyone is expected to learn at the same pace. Various studies have successfully demonstrated that certain aspects of mastery learning can be integrated into the curriculum. We adopt a similar pedagogical strategy and explain our implementation approach with online assessments. Since this is a new pedagogical approach to assessing student learning, we collected data to investigate test-taking behavior and evaluated potential learning gains in this new test format. As part of this endeavor, we developed a model to detect trial-and-error sequences in test attempts. Our results point to a small percentage of guessing behavior, which is encouraging evidence supporting this test approach is a viable way to implement mastery learning in our curriculum.

Names, Nicknames, and Spelling Errors: Protecting Participant Identity in Learning Analytics of Online Discussions

Messages exchanged between participants in online discussion forums often contain personal names and other details that need to be redacted before the data is used for research purposes in learning analytics. However, removing the names entirely makes it harder to track the exchange of ideas between individuals within a message thread and across threads, and thereby reduces the value of this type of conversational data. In contrast, the consistent use of pseudonyms allows contributions from individuals to be tracked across messages, while also hiding the real identities of the contributors. Several factors can make it difficult to identify all instances of personal names that refer to the same individual, including spelling errors and the use of shortened forms. We developed a semi-automated approach for replacing personal names with consistent pseudonyms. We evaluated our approach on a data set of over 1,700 messages exchanged during a distance-learning course, and compared it to a general-purpose pseudonymisation tool that used deep neural networks to identify names to be redacted. We found that our tailored approach out-performed the general-purpose tool in both precision and recall, correctly identifying all but 31 substitutions out of 2,888.

How to Open Science: A Principle and Reproducibility Review of the Learning Analytics and Knowledge Conference

Within the field of education technology, learning analytics has increased in popularity over the past decade. Researchers conduct experiments and develop software, building on each other’s work to create more intricate systems. In parallel, open science — which describes a set of practices to make research more open, transparent, and reproducible — has exploded in recent years, resulting in more open data, code, and materials for researchers to use. However, without prior knowledge of open science, many researchers do not make their datasets, code, and materials openly available, and those that are available are often difficult, if not impossible, to reproduce. The purpose of the current study was to take a close look at our field by examining previous papers within the proceedings of the International Conference on Learning Analytics and Knowledge, and document the rate of open science adoption (e.g., preregistration, open data), as well as how well available data and code could be reproduced. Specifically, we examined 133 research papers, allowing ourselves 15 minutes for each paper to identify open science practices and attempt to reproduce the results according to their provided specifications. Our results showed that less than half of the research adopted standard open science principles, with approximately 5% fully meeting some of the defined principles. Further, we were unable to reproduce any of the papers successfully in the given time period. We conclude by providing recommendations on how to improve the reproducibility of our research as a field moving forward.

All openly accessible work can be found in an Open Science Foundation project1.

Impact of Non-Cognitive Interventions on Student Learning Behaviors and Outcomes: An analysis of seven large-scale experimental inventions

As evidence grows supporting the importance of non-cognitive factors in learning, computer-assisted learning platforms increasingly incorporate non-academic interventions to influence student learning and learning related-behaviors. Non-cognitive interventions often attempt to influence students’ mindset, motivation, or metacognitive reflection to impact learning behaviors and outcomes. In the current paper, we analyze data from five experiments, involving seven treatment conditions embedded in mastery-based learning activities hosted on a computer-assisted learning platform focused on middle school mathematics. Each treatment condition embodied a specific non-cognitive theoretical perspective. Over seven school years, 20,472 students participated in the experiments. We estimated the effects of each treatment condition on students’ response time, hint usage, likelihood of mastering knowledge components, learning efficiency, and post-tests performance. Our analyses reveal a mix of both positive and negative treatment effects on student learning behaviors and performance. Few interventions impacted learning as assessed by the post-tests. These findings highlight the difficulty in positively influencing student learning behaviors and outcomes using non-cognitive interventions.

CVPE: A Computer Vision Approach for Scalable and Privacy-Preserving Socio-spatial, Multimodal Learning Analytics

Capturing data on socio-spatial behaviours is essential in obtaining meaningful educational insights into collaborative learning and teamwork in co-located learning contexts. Existing solutions, however, have limitations regarding scalability and practicality since they rely largely on costly location tracking systems, are labour-intensive, or are unsuitable for complex learning environments. To address these limitations, we propose an innovative computer-vision-based approach – Computer Vision for Position Estimation (CVPE) – for collecting socio-spatial data in complex learning settings where sophisticated collaborations occur. CVPE is scalable and practical with a fast processing time and only needs low-cost hardware (e.g., cameras and computers). The built-in privacy protection modules also minimise potential privacy and data security issues by masking individuals’ facial identities and provide options to automatically delete recordings after processing, making CVPE a suitable option for generating continuous multimodal/classroom analytics. The potential of CVPE was evaluated by applying it to analyse video data about teamwork in simulation-based learning. The results showed that CVPE extracted socio-spatial behaviours relatively reliably from video recordings compared to indoor positioning data. These socio-spatial behaviours extracted with CVPE uncovered valuable insights into teamwork when analysed with epistemic network analysis. The limitations of CVPE for effective use in learning analytics are also discussed.

METS: Multimodal Learning Analytics of Embodied Teamwork Learning

Embodied team learning is a form of group learning that occurs in co-located settings where students need to interact with others while actively using resources in the physical learning space to achieve a common goal. In such situations, communication dynamics can be complex as team discourse segments can happen in parallel at different locations of the physical space with varied team member configurations. This can make it hard for teachers to assess the effectiveness of teamwork and for students to reflect on their own experiences. To address this problem, we propose METS (Multimodal Embodied Teamwork Signature), a method to model team dialogue content in combination with spatial and temporal data to generate a signature of embodied teamwork. We present a study in the context of a highly dynamic healthcare team simulation space where students can freely move. We illustrate how signatures of embodied teamwork can help to identify key differences between high and low performing teams: i) across the whole learning session; ii) at different phases of learning sessions; and iii) at particular spaces of interest in the learning space.

Student Profiles of Change in a University Course: A Complex Dynamical Systems Perspective

Learning analytics approaches to profiling students based on their study behaviour remain limited in how they integrate temporality and change. To advance this area of work, the current study examines profiles of change in student study behaviour in a blended undergraduate engineering course. The study is conceptualised through complex dynamical systems theory and its applications in psychological and cognitive science research. Students were profiled based on the changes in their behaviour as observed in clickstream data. Measure of entropy in the recurrence of student behaviour was used to indicate the change of a student state, consistent with the evidence from cognitive sciences. Student trajectories of weekly entropy values were clustered to identify distinct profiles. Three patterns were identified: stable weekly study, steep changes in weekly study, and moderate changes in weekly study. The students with steep changes in their weekly study activity had lower exam grades and showed destabilisation of weekly behaviour earlier in the course. The study investigated the relationships between these profiles of change, student performance, and other approaches to learner profiling, such as self-reported measures of self-regulated learning, and profiles based on the sequences of learning actions.

Effects of Modalities in Detecting Behavioral Engagement in Collaborative Game-Based Learning

Collaborative game-based learning environments have significant potential for creating effective and engaging group learning experiences. These environments offer rich interactions between small groups of students by embedding collaborative problem solving within immersive virtual worlds. Students often share information, ask questions, negotiate, and construct explanations between themselves towards solving a common goal. However, students sometimes disengage from the learning activities, and due to the nature of collaboration, their disengagement can propagate and negatively impact others within the group. From a teacher's perspective, it can be challenging to identify disengaged students within different groups in a classroom as they need to spend a significant amount of time orchestrating the classroom. Prior work has explored automated frameworks for identifying behavioral disengagement. However, most prior work relies on a single modality for identifying disengagement. In this work, we investigate the effects of using multiple modalities to detect disengagement behaviors of students in a collaborative game-based learning environment. For that, we utilized facial video recordings and group chat messages of 26 middle school students while they were interacting with Crystal Island: EcoJourneys, a game-based learning environment for ecosystem science. Our study shows that the predictive accuracy of a unimodal model heavily relies on the modality of the ground truth, whereas multimodal models surpass the unimodal models, trading resources for accuracy. Our findings can benefit future researchers in designing behavioral engagement detection frameworks for assisting teachers in using collaborative game-based learning within their classrooms.

Insights into undergraduate pathways using course load analytics

Course load analytics (CLA) inferred from LMS and enrollment features can offer a more accurate representation of course workload to students than credit hours and potentially aid in their course selection decisions. In this study, we produce and evaluate the first machine-learned predictions of student course load ratings and generalize our model to the full 10,000 course catalog of a large public university. We then retrospectively analyze longitudinal differences in the semester load of student course selections throughout their degree. CLA by semester shows that a student’s first semester at the university is among their highest load semesters, as opposed to a credit hour-based analysis, which would indicate it is among their lowest. Investigating what role predicted course load may play in program retention, we find that students who maintain a semester load that is low as measured by credit hours but high as measured by CLA are more likely to leave their program of study. This discrepancy in course load is particularly pertinent in STEM and associated with high prerequisite courses. Our findings have implications for academic advising, institutional handling of the freshman experience, and student-facing analytics to help students better plan, anticipate, and prepare for their selected courses.

Do Associations Between Mind Wandering and Learning from Complex Texts Vary by Assessment Depth and Time?

We examined associations between mind wandering – where attention shifts from the task at hand to task-unrelated thoughts – and learning outcomes. Our data consisted of 177 students who self-reported mind wandering while reading five long, connected texts on scientific research methods and completed learning assessments targeting multiple depths of processing (rote, inference, integration) at different timescales (during and after reading each text, after reading all texts, and after a week-long delay). We found that mind wandering negatively predicted measures of factual, text-based (explicit) information and global integration of information across multiple parts of the text, but not measures requiring a local inference on a single sentence. Further, mind wandering only predicted comprehension measures assessed during the reading session and not after a week-long delay. Our findings provide important nuances to the established negative link between mind wandering and learning outcomes, which has predominantly focused on rote comprehension assessed during the learning session itself. Implications for interventions to address mind wandering during learning are discussed.

The current state of using learning analytics to measure and support K-12 student engagement: A scoping review

Student engagement has been identified as a critical construct for understanding and predicting educational success. However, research has shown that it can be hard to align data-driven insights of engagement with observed and self-reported levels of engagement. Given the emergence and increasing application of learning analytics (LA) within K-12 education, further research is needed to understand how engagement is being conceptualized and measured within LA research. This scoping review identifies and synthesizes literature published between 2011-2022, focused on LA and student engagement in K-12 contexts, and indexed in five international databases. 27 articles and conference papers from 13 different countries were included for review. We found that most of the research was undertaken in middle school years within STEM subjects. The results show that there is a wide discrepancy in researchers’ understanding and operationalization of engagement and little evidence to suggest that LA improves learning outcomes and support. However, the potential to do so remains strong. Guidance is provided for future LA engagement research to better align with these goals.

When the Tutor Becomes the Student: Design and Evaluation of Efficient Scenario-based Lessons for Tutors

Tutoring is among the most impactful educational influences on student achievement, with perhaps the greatest promise of combating student learning loss. Due to its high impact, organizations are rapidly developing tutoring programs and discovering a common problem- a shortage of qualified, experienced tutors. This mixed methods investigation focuses on the impact of short (∼15 min.), online lessons in which tutors participate in situational judgment tests based on everyday tutoring scenarios. We developed three lessons on strategies for supporting student self-efficacy and motivation and tested them with 80 tutors from a national, online tutoring organization. Using a mixed-effects logistic regression model, we found a statistically significant learning effect indicating tutors performed about 20% higher post-instruction than pre-instruction (β = 0.811, p < 0.01). Tutors scored ∼30% better on selected compared to constructed responses at posttest with evidence that tutors are learning from selected-response questions alone. Learning analytics and qualitative feedback suggest future design modifications for larger scale deployment, such as creating more authentically challenging selected-response options, capturing common misconceptions using learnersourced data, and varying modalities of scenario delivery with the aim of maintaining learning gains while reducing time and effort for tutor participants and trainers.

Investigating Student's Problem-solving Approaches in MOOCs using Natural Language Processing

Problem-solving approaches are an essential part of learning. Knowing how students approach solving problems can help instructors improve their instructional designs and effectively guide the learning process of students. We propose a natural language processing (NLP) driven method to capture online learners’ problem-solving approaches at scale while using Massive Open Online Courses (MOOCs) as a learning platform. We employ an online survey to gather data, NLP techniques, and existing educational theories to investigate this in the lens of both computer science and education. The paper shows how NLP techniques, i.e. preprocessing, topic modeling, and text summarization, must be tuned to extract information from a large-scale text corpus. The proposed method discovered 18 problem-solving approaches from the text data, such as using pen and paper, peer learning, trial and error, etc. We also observed topics that appear over the years, such as clarifying code logic, watching videos, etc. We observed that students heavily rely on "tools" for solving programming problems and can expect that such selection of methods can vary depending on the type of task.

Lost in Translation: Determining the Generalizability of Temporal Models across Course Contexts 

A common activity in learning analytics research is to demonstrate a new analytical technique by applying it to data from a single course.  We explore whether the value of an analytical approach might generalize across course contexts.  Accordingly, we conduct a conceptual replication of a well-cited temporal modeling study using self-regulated learning (SRL) taxonomies. We attempt to conceptually replicate this previous work through the analysis of 411 students across 19 courses’ trace event data. Using established SRL categorizations, learner actions are sequenced to identify regular clusters of interaction through hierarchical clustering methods. These clusters are then compared with the entire data corpus and each other through the development of first-order Markov models to develop process maps. Our findings indicate that, although some general patterns of SRL can generalize, these results are more limited at higher scales. Comparing these clusters of interaction along students’ performance in courses also indicates some relationships between activity and outcomes, though this finding is also limited in relation to the complexity introduced by scaling out these methods. We discuss how these temporal models should be viewed when making descriptive and qualitative inferences about students’ activity in digital learning environments.  

Instructor-in-the-Loop Exploratory Analytics to Support Group Work

This case study examines an interactive, low barrier process, termed instructor-in-the-loop, by which an instructor defines and makes meaning from exploratory metrics and visualizations, and uses this multimodal information to improve a course iteratively. We present potentials for course improvement based on automated learning analytics insights related to students’ participation in small active learning sessions associated with a large lecture course. Automated analytics processes are essential for larger courses where engaging smaller groups is important to ensure participation and understanding, but monitoring a large total number of groups throughout an instructional experience becomes untenable for the instructor. Of interest is providing instructors with easy-to-digest summaries of group performance that do not require complex set up and knowledge of more advanced algorithmic approaches. We explore synthesizing metrics and visualizations as ways to engage instructors in meaning making of complex learning environments, but in a low barrier manner that provides insights quickly.

When to Intervene? Utilizing Two Facets of Temporality in Students’ SRL Processes in a Programming Course

This study explored two aspects of temporality in students’ SRL behaviours to understand the dynamics of SRL phase transitions. In the first aspect, which refers to the temporal order of Self-regulated learning (SRL) phases, we characterized four types of SRL processes based on phase transitions and the cyclical nature of SRL. The SRL types were mapped into the kinds of iterative behaviours over SRL phases which correspond to the theorized self-regulatory behaviours of students at different levels of SRL skills. We found a significant association between SRL types and the assignment grades that suggests the higher achieved learning outcomes, i.e., programming skills demonstrated in the assignments, being associated with more advanced SRL processes. This study also focused on the second aspect of temporality, which refers to the instance of time. We revealed the temporal dynamics between SRL phase transitions by analyzing time profiles for transitions in each SRL process type. Next, we showed that a two-day interval is a threshold by which most students iteratively transition from adapting to enactment phases, which provides a suitable time to intervene if the transition is not observed.

Towards more replicable content analysis for learning analytics

Content analysis (CA) is a method frequently used in the learning sciences and so increasingly applied in learning analytics (LA). Despite this ubiquity, CA is a subtle method, with many complexities and decision points affecting the outcomes it generates. Although appearing to be a neutral quantitative approach, coding CA constructs requires an attention to decision making and context that aligns it with a more subjective, qualitative interpretation of data. Despite these challenges, we increasingly see the labels in CA-derived datasets used as training sets for machine learning (ML) methods in LA. However, the scarcity of widely shareable datasets means research groups usually work independently to generate labelled data, with few attempts made to compare practice and results across groups. A risk is emerging that different groups are coding constructs in different ways, leading to results that will not prove replicable. We report on two replication studies using a previously reported construct. A failure to achieve high inter-rater reliability suggests that coding of this scheme is not currently replicable across different research groups. We point to potential dangers in this result for those who would use ML to automate the detection of various educationally relevant constructs in LA.

Using Transformer Language Models to Validate Peer-Assigned Essay Scores in Massive Open Online Courses (MOOCs)

Massive Open Online Courses (MOOCs) such as those offered by Coursera are popular ways for adults to gain important skills, advance their careers, and pursue their interests. Within these courses, students are often required to compose, submit, and peer review written essays, providing a valuable pedagogical experience for the student and a wealth of natural language data for the educational researcher. However, the scores provided by peers do not always reflect the actual quality of the text, generating questions about the reliability and validity of the scores. This study evaluates methods to increase the reliability of MOOC peer-review ratings through a series of validation tests on peer-reviewed essays. Reliability of reviewers was based on correlations between text length and essay quality. Raters were pruned based on score variance and the lexical diversity observed in their comments to create sub-sets of raters. Each subset was then used as training data to finetune distilBERT large language models to automatically score essay quality as a measure of validation. The accuracy of each language model for each subset was evaluated. We find that training language models on data subsets produced by more reliable raters based on a combination of score variance and lexical diversity produce more accurate essay scoring models. The approach developed in this study should allow for enhanced reliability of peer-reviewed scoring in MOOCS affording greater credibility within the systems.

Using Teacher Dashboards to Customize Lesson Plans for a Problem-Based, Middle School STEM Curriculum

Keeping K-12 teachers engaged during students’ learning and problem solving in technology-enhanced, integrated problem-based learning (PBL) has been shown to support deeper student involvement, and, therefore, better success learning difficult science, computing, and engineering concepts and practices. However, students’ learning processes and corresponding difficulties are not easily noticed by teachers as students learn from these environments as processes are captured through mouse clicks, drag and drop actions, and other low-level activities. As such, teachers find it difficult to set up meaningful interactions with students while also maintaining the focus on student-centered learning. Little research has examined dashboard-supported responsive teaching practices for K-12 PBL. This study examined 8 teachers as they used a co-designed teacher dashboard to assess and respond to students’ learning and strategies during an integrated, PBL STEM curriculum. Teachers completed a series of 5 “planning period simulations” leveraging the dashboard and think-aloud protocols were implemented, supported by semi-structured interview questions, to enable the teachers to verbalize their thought and evaluation processes. Content analysis and epistemic network analysis were conducted to analyze the simulations. Understanding how teachers use dashboards to support evidence-based teaching practices during technology-enhanced curricula is critical for improving teacher support and preparation.

Multimodal Predictive Student Modeling with Multi-Task Transfer Learning

Game-based learning environments have the distinctive capacity to promote learning experiences that are both engaging and effective. Recent advances in sensor-based technologies (e.g., facial expression analysis and eye gaze tracking) and natural language processing have introduced the opportunity to leverage multimodal data streams for learning analytics. Learning analytics and student modeling informed by multimodal data captured during students’ interactions with game-based learning environments hold significant promise for designing effective learning environments that detect unproductive student behaviors and provide adaptive support for students during learning. Learning analytics frameworks that can accurately predict student learning outcomes early in students’ interactions hold considerable promise for enabling environments to dynamically adapt to individual student needs. In this paper, we investigate a multimodal, multi-task predictive student modeling framework for game-based learning environments. The framework is evaluated on two datasets of game-based learning interactions from two student populations (n=61 and n=118) who interacted with two versions of a game-based learning environment for microbiology education. The framework leverages available multimodal data channels from the datasets to simultaneously predict student post-test performance and interest. In addition to inducing models for each dataset individually, this work investigates the ability to use information learned from one source dataset to improve models based on another target dataset (i.e., transfer learning using pre-trained models). Results from a series of ablation experiments indicate the differences in predictive capacity among a combination of modalities including gameplay, eye gaze, facial expressions, and reflection text for predicting the two target variables. In addition, multi-task models were able to improve predictive performance compared to single-task baselines for one target variable, but not both. Lastly, transfer learning showed promise in improving predictive capacity in both datasets.

Trusting the Explainers: Teacher Validation of Explainable Artificial Intelligence for Course Design

Deep learning models for learning analytics have become increasingly popular over the last few years; however, these approaches are still not widely adopted in real-world settings, likely due to a lack of trust and transparency. In this paper, we tackle this issue by implementing explainable AI methods for black-box neural networks. This work focuses on the context of online and blended learning and the use case of student success prediction models. We use a pairwise study design, enabling us to investigate controlled differences between pairs of courses. Our analyses cover five course pairs that differ in one educationally relevant aspect and two popular instance-based explainable AI methods (LIME and SHAP). We quantitatively compare the distances between the explanations across courses and methods. We then validate the explanations of LIME and SHAP with 26 semi-structured interviews of university-level educators regarding which features they believe contribute most to student success, which explanations they trust most, and how they could transform these insights into actionable course design decisions. Our results show that quantitatively, explainers significantly disagree with each other about what is important, and qualitatively, experts themselves do not agree on which explanations are most trustworthy. All code, extended results, and the interview protocol are provided at https://github.com/epfl-ml4ed/trusting-explainers.

The Doer Effect at Scale: Investigating Correlation and Causation Across Seven Courses

The future of digital learning should be focused on methods proven to be effective by learning science and learning analytics. One such method is learning by doing—combining formative practice with expository content so students actively engage with their learning resource. This generates the doer effect: the principle that students who do practice while they read have higher outcomes than those who only read [9]. Research on the doer effect has shown it to be causal to learning [10], and these causal findings have previously been replicated in a single course [19]. This study extends the replication of the doer effect by analyzing 15.2 million data events from 18,546 students in seven courses at an online higher education institution, the most students and courses known to date. Furthermore, we analyze each course five ways by using different outcomes, accounting for prior knowledge, and doing both correlational and causal analyses. By performing the doer effect analyses five ways on seven courses, new insights are gained on how this method of learning analytics can contribute to our interpretation of this learning science principle. Practical implications of the doer effect for students are discussed, and future research goals are established.

Instruction-Embedded Assessment for Reading Ability in Adaptive Mathematics Software

Adaptive educational software is likely to better support broader and more diverse sets of learners by considering more comprehensive views (or models) of such learners. For example, recent work proposed making inferences about “non-math” factors like reading comprehension while students used adaptive software for mathematics to better support and adapt to learners. We build on this proposed approach to more comprehensive learning modeling by providing an empirical basis for making inferences about students’ reading ability from their performance on activities in adaptive software for mathematics. We lay out an approach to predicting middle school students’ reading ability using their performance on activities within Carnegie Learning’s MATHia, a widely used intelligent tutoring system for mathematics. We focus on how performance in an early, introductory activity as an especially powerful place to consider instruction-embedded assessment of non-math factors like reading comprehension to guide adaptation based on factors like reading ability. We close by discussing opportunities to extend this work by focusing on particular knowledge components or skills tracked by MATHia that may provide important “levers” for driving adaptation based on students’ reading ability while they learn and practice mathematics.

Online help-seeking occurring in multiple computer-mediated conversations affects grades in an introductory programming course

Computing education researchers often study the impact of online help-seeking behaviors that occur across multiple online resources in isolation. Such separation fails to capture the interconnected nature of online help-seeking behaviors that occur across multiple online resources and its affect on course grades. This is particularly important for programming education, which arguably has more online resources to seek help from other people (e.g., computer-mediated conversations) than other majors. Using data from an introductory programming course (CS1) at a large US university, we found that students (n=301) sought help in multiple computer-mediated conversations, both Q&A forum and online office hours (OHQ), differently. Results showed the more prior knowledge about programming students had, the more they sought help in the Q&A compared to students with less prior knowledge. In general, higher-performing students sought help online in the Q&A more than the lower-performing groups on all the homework assignments, but not for the OHQ. By better understanding how students seek help online across multiple modalities of computer-mediated conversations and the relationship between help-seeking and grades, we can re-design online resources that best support all students in introductory programming courses at scale.

Predicting Co-occurring Emotions in MetaTutor when Combining Eye-Tracking and Interaction Data from Separate User Studies

Learning can be improved by providing personalized feedback adapting to the emotions that the learner may be experiencing. There is initial evidence that co-occurring emotions can be predicted during learning in Intelligent Tutoring Systems (ITS) through eye-tracking and interaction data. Predicting co-occurring emotions is a complex task and merging datasets has the potential to improve predictive performance. In this paper, we combine data from two user studies with an ITS, and analyze whether there is an improvement in predictive performance of co-occurring emotions, despite the user studies using different eye-trackers. In the pursuit towards developing real affect-aware ITS, we look at whether we can isolate classifiers that perform better than a baseline. In this regard we perform a series of statistical analyses and test out the predictive performance of standard machine learning models as well as an ensemble classifier for the task of predicting co-occurring emotions.

Identification, Exploration, and Remediation: Can Teachers Predict Common Wrong Answers?

Prior work analyzing tutoring sessions provided evidence that highly effective tutors, through their interaction with students and their experience, can perceptively recognize incorrect processes or “bugs” when students incorrectly answer problems. Researchers have studied these tutoring interactions examining instructional approaches to address incorrect processes and observed that the format of the feedback can influence learning outcomes. In this work, we recognize the incorrect answers caused by these buggy processes as Common Wrong Answers (CWAs). We examine the ability of teachers and instructional designers to identify CWAs proactively. As teachers and instructional designers deeply understand the common approaches and mistakes students make when solving mathematical problems, we examine the feasibility of proactively identifying CWAs and generating Common Wrong Answer Feedback (CWAFs) as a formative feedback intervention for addressing student learning needs. As such, we analyze CWAFs in three sets of analyses. We first report on the accuracy of the CWAs predicted by the teachers and instructional designers on the problems across two activities. We then measure the effectiveness of the CWAFs using an intent-to-treat analysis. Finally, we explore the existence of personalization effects of the CWAFs for the students working on the two mathematics activities.

Learning Analytics and Stakeholder Inclusion: What do We Mean When We Say "Human-Centered"?

Given the growth in interest in human-centeredness within the learning analytics community - a workshop at LAK, a special issue in the Journal of Learning Analytics and multiple papers published on the topic - it seems an appropriate time to critically evaluate the popular design approach. Using a corpus of 165 publications that have substantial reference to both learning analytics and human-centeredness, the following paper delineates what is meant by "human-centered" and then discusses what the implications are for this approach. The conclusion reached through this analysis is that when authors refer to human-centeredness in learning analytics they are largely referring to stakeholder inclusion and the means by which this can be achieved (methodologically, politically and logistically). Furthermore, the justification for stakeholder inclusion is often coached in terms of its ability to develop more effective learning analytics applications along several dimensions (efficiency, efficacy, impact). With reference to human-centered design in other fields a discussion follows of the issues with such an approach and a prediction that LA will likely move toward a more neutral stance on stakeholder inclusion, as has occurred in both human-centered design and stakeholder engagement research in the past. A more stakeholder-neutral stance is defined as one in which stakeholder inclusion is one of many tools utilized in developing learning analytics applications.

Towards Automated Analysis of Rhetorical Categories in Students Essay Writings using Bloom’s Taxonomy

Essay writing has become one of the most common learning tasks assigned to students enrolled in various courses at different educational levels, owing to the growing demand for future professionals to effectively communicate information to an audience and develop a written product (i.e. essay). Evaluating a written product requires scorers who manually examine the existence of rhetorical categories, which is a time-consuming task. Machine Learning (ML) approaches have the potential to alleviate this challenge. As a result, several attempts have been made in the literature to automate the identification of rhetorical categories using Rhetorical Structure Theory (RST). However, RST do not provide information regarding students’ cognitive level, which motivates the use of Bloom’s Taxonomy. Therefore, in this research we propose to: i) investigate the extent to which classification of rhetorical categories can be automated based on Bloom’s taxonomy by comparing the traditional ML classifiers with the pre-trained language model BERT, ii) explore the associations between rhetorical categories and writing performance. Our results showed that BERT model outperformed the traditional ML-based classifiers with 18% better accuracy, indicating it can be used in future analytics tool. Moreover, we found a statistical difference between the associations of rhetorical categories in low-achiever, medium-achiever and high-achiever groups which implies that rhetorical categories can be predictive of writing performance.

Recurrence Quantification Analysis of Eye Gaze Dynamics During Team Collaboration

Shared visual attention between team members facilitates collaborative problem solving (CPS), but little is known about how team-level eye gaze dynamics influence the quality and successfulness of CPS. To better understand the role of shared visual attention during CPS, we collected eye gaze data from 279 individuals solving computer-based physics puzzles while in teams of three. We converted eye gaze into discrete screen locations and quantified team-level gaze dynamics using recurrence quantification analysis (RQA). Specifically, we used a centroid-based auto-RQA approach, a pairwise team member cross-RQAs approach, and a multi-dimensional RQA approach to quantify team-level eye gaze dynamics from the eye gaze data of team members. We find that teams differing in composition based on prior task knowledge, gender, and race show few differences in team-level eye gaze dynamics. We also find that RQA metrics of team-level eye gaze dynamics were predictive of task success (all ps < .001). However, the same metrics showed different patterns of feature importance depending on predictive model and RQA type, suggesting some redundancy in task-relevant information. These findings signify that team-level eye gaze dynamics play an important role in CPS and that different forms of RQA pick up on unique aspects of shared attention between team-members.

Do Not Trust a Model Because It is Confident: Uncovering and Characterizing Unknown Unknowns to Student Success Predictors in Online-Based Learning

Student success models might be prone to develop weak spots, i.e., examples hard to accurately classify due to insufficient representation during model creation. This weakness is one of the main factors undermining users’ trust, since model predictions could for instance lead an instructor to not intervene on a student in need. In this paper, we unveil the need of detecting and characterizing unknown unknowns in student success prediction in order to better understand when models may fail. Unknown unknowns include the students for which the model is highly confident in its predictions, but is actually wrong. Therefore, we cannot solely rely on the model’s confidence when evaluating the predictions quality. We first introduce a framework for the identification and characterization of unknown unknowns. We then assess its informativeness on log data collected from flipped courses and online courses using quantitative analyses and interviews with instructors. Our results show that unknown unknowns are a critical issue in this domain and that our framework can be applied to support their detection. The source code is available at https://github.com/epfl-ml4ed/unknown-unknowns.

Using a Webcam Based Eye-tracker to Understand Students’ Thought Patterns and Reading Behaviors in Neurodivergent Classrooms

Previous learning analytics efforts have attempted to leverage the link between students’ gaze behaviors and learning experiences to build effective real-time interventions. Historically, however, these technologies have not been scalable due to the high cost of eye-tracking devices. Further, such efforts have been almost exclusively focused on neurotypical students, despite recent work that suggests a “one size fits many” approach can disadvantage neurodivergent students. Here we attempt to address these limitations by examining the validity and applicability of using scalable, webcam-based eye tracking as a basis for adaptively responding to neurodivergent students in an educational setting. Forty-three neurodivergent students read a text and answered questions about their in-situ thought patterns while a webcam-based eye tracker assessed their gaze locations. Results indicate that eye-tracking measures were sensitive to: 1) moments when students experienced difficulty disengaging from their own thoughts and 2) students’ familiarity with the text. Our findings highlight the fact that a free, open-source, webcam-based eye-tracker can be used to assess differences in reading patterns and online thought patterns. We discuss the implications and possible applications of these results, including the idea that webcam-based eye tracking may be a viable solution for designing real-time interventions for neurodivergent student populations.

The Student Zipf Theory: Inferring Latent Structures in Open-Ended Student Work To Help Educators

Are there structures underlying student work that are universal across every open-ended task? We demonstrate that, across many subjects and assignment types, the probability distribution underlying student-generated open-ended work is close to Zipf’s Law. Inferring this latent structure for classroom assignments can help learning analytics researchers, instruction designers, and educators understand the landscape of various student approaches, assess the complexity of assignments, and prioritise pedagogical attention. However, typical classrooms are way too small to witness even the contour of the Zipfian pattern, and it is generally impossible to perform inference for Zipf’s law from such small number of samples. We formalise this difficult task as the Zipf Inference Challenge: (1) Infer the ordering of student-generated works by their underlying probabilities, and (2) Estimate the shape parameter of the underlying distribution in a typical-sized classroom. Our key insight in addressing this challenge is to leverage the densities of the student response landscapes represented by semantic similarity. We show that our “Semantic Density Estimation” method is able to do a much better job at inferring the latent Zipf shape and the probability-ordering of student responses for real world education datasets.

Identifying Gaze Behavior Evolution via Temporal Fully-Weighted Scanpath Graphs

Eye-tracking technology has expanded our ability to quantitatively measure human perception. This rich data source has been widely used to characterize human behavior and cognition. However, eye-tracking analysis has been limited in its applicability, as contextualizing gaze to environmental artifacts is non-trivial. Moreover, the temporal evolution of gaze behavior through open-ended environments where learners are alternating between tasks often remains unclear. In this paper, we propose temporal fully-weighted scanpath graphs as a novel representation of gaze behavior and combine it with a clustering scheme to obtain high-level gaze summaries that can be mapped to cognitive tasks via network metrics and cluster mean graphs. In a case study with nurse simulation-based team training, our approach was able to explain changes in gaze behavior with respect to key events during the simulation. By identifying cognitive tasks via gaze behavior, learners’ strategies can be evaluated to create online performance metrics and personalized feedback.

Protected Attributes Tell Us Who, Behavior Tells Us How: A Comparison of Demographic and Behavioral Oversampling for Fair Student Success Modeling

Algorithms deployed in education can shape the learning experience and success of a student. It is therefore important to understand whether and how such algorithms might create inequalities or amplify existing biases. In this paper, we analyze the fairness of models which use behavioral data to identify at-risk students and suggest two novel pre-processing approaches for bias mitigation. Based on the concept of intersectionality, the first approach involves intelligent oversampling on combinations of demographic attributes. The second approach does not require any knowledge of demographic attributes and is based on the assumption that such attributes are a (noisy) proxy for student behavior. We hence propose to directly oversample different types of behaviors identified in a cluster analysis. We evaluate our approaches on data from (i) an open-ended learning environment and (ii) a flipped classroom course. Our results show that both approaches can mitigate model bias. Directly oversampling on behavior is a valuable alternative, when demographic metadata is not available. Source code and extended results are provided in https://github.com/epfl-ml4ed/behavioral-oversampling.

Moral Machines or Tyranny of the Majority? A Systematic Review on Predictive Bias in Education

Machine Learning (ML) techniques have been increasingly adopted to support various activities in education, including being applied in important contexts such as college admission and scholarship allocation. In addition to being accurate, the application of these techniques has to be fair, i.e., displaying no discrimination towards any group of stakeholders in education (mainly students and instructors) based on their protective attributes (e.g., gender and age). The past few years have witnessed an explosion of attention given to the predictive bias of ML techniques in education. Though certain endeavors have been made to detect and alleviate predictive bias in learning analytics, it is still hard for newcomers to penetrate. To address this, we systematically reviewed existing studies on predictive bias in education, and a total of 49 peer-reviewed empirical papers published after 2010 were included in this study. In particular, these papers were reviewed and summarized from the following three perspectives: (i) protective attributes, (ii) fairness measures and their applications in various educational tasks, and (iii) strategies for enhancing predictive fairness. These findings were summarized into recommendations to guide future endeavors in this strand of research, e.g., collecting and sharing more quality data containing protective attributes, developing fairness-enhancing approaches which do not require the explicit use of protective attributes, validating the effectiveness of fairness-enhancing on students and instructors in real-world settings.

Towards explainable prediction of essay cohesion in Portuguese and English

Textual cohesion is an essential aspect of a formally written text, related to linguistic mechanisms that connect elements such as words, sentences, and paragraphs. Several studies have proposed approaches to estimate textual cohesion in essays automatically. There is limited research that aims to study the extent to which the use of machine learning approaches can predict the textual cohesion of essays written in different languages (not just English). This paper reports on the findings of a study that aimed to propose and evaluate approaches that automatically estimate the cohesion of essays in Portuguese and English. The study proposed regression-based models grounded in conventional feature-based machine learning methods and deep learning-based pre-trained language models. The study also examined the explainability of automated approaches to scrutinize their predictions. We analyzed two datasets composed of 4,570 (Portuguese) and 7,101 (English) essays. The results demonstrate that a deep learning-based model achieved the best performance on both datasets with a moderate Pearson correlation with human-rated cohesion scores. However, the explainability of the automatic cohesion estimations based on conventional machine learning models offered a stronger potential than that of the deep learning model.

Can We Empower Attentive E-reading with a Social Robot? An Introductory Study with a Novel Multimodal Dataset and Deep Learning Approaches

Reading on digital devices has become more commonplace, while it often poses challenges to learners’ attention. In this study, we hypothesized that allowing learners to reflect on their reading phases with an empathic social robot companion might enhance learners’ attention in e-reading. To verify our assumption, we collected a novel dataset (SKEP) in an e-reading setting with social robot support. It contains 25 multimodal features from various sensors and logged data that are direct and indirect cues of attention. Based on the SKEP dataset, we comprehensively compared the difference between HRI-based (treatment) and GUI-based (control) feedback and obtained insights for intervention design. Based on the human annotation of the nearly 40 hours of video data streams from 60 subjects, we developed a machine learning model to capture attention-regulation behaviors in e-reading. We exploited a two-stage framework to recognize learners’ observable self-regulatory behaviors and conducted attention analysis. The proposed system showed a promising performance with high prediction results of e-reading with HRI, such as 72.97% accuracy in recognizing attention regulation behaviors, 74.29% accuracy in predicting knowledge gain, 75.00% for perceived interaction experience, and 75.00% for perceived social presence. We believe our work can inspire the future design of HRI-based e-reading and its analysis.

SESSION: Short Research Papers

Automated, content-focused feedback for a writing-to-learn assignment in an undergraduate organic chemistry course

Writing-to-learn (WTL) pedagogy supports the implementation of writing assignments in STEM courses to engage students in conceptual learning. Recent studies in the undergraduate STEM context demonstrate the value of implementing WTL, with findings that WTL can support meaningful learning and elicit students’ reasoning. However, the need for instructors to provide feedback on students’ writing poses a significant barrier to implementing WTL; this barrier is especially notable in the context of introductory organic chemistry courses at large universities, which often have large enrollments. This work describes one approach to overcome this barrier by presenting the development of an automated feedback tool for providing students with formative feedback on their responses to an organic chemistry WTL assignment. This approach leverages machine learning models to identify features of students’ mechanistic reasoning in response to WTL assignments in a second-semester, introductory organic chemistry laboratory course. The automated feedback tool development was guided by a framework for designing automated feedback, theories of self-regulated learning, and the components of effective WTL pedagogy. Herein, we describe the design of the automated feedback tool and report our initial evaluation of the tool through pilot interviews with organic chemistry students.

Fostering Privacy Literacy among High School Students by Leveraging Social Media Interaction and Learning Traces in the Classroom

With daily social media consumption among teens exceeding eight hours, it becomes increasingly important to raise awareness about the digital traces they leave behind. However, concepts of privacy literacy such as data and metadata can seem abstract and difficult to grasp. In this short research paper, we tackle this issue by designing, implementing and presenting an evaluation of a novel technology-enhanced pedagogical scenario for high school students. The scenario covers two main sessions. In a first session the SpeakUp social-media-like classroom interaction app is used to support a digitally mediated debate. In the second session, the actual learning traces from the digitally mediated debate are used as an object of study to enable students to reflect on the traces they leave behind on social media platforms. In order to enable this scenario we extended the existing SpeakUp app to the specifics of the context. The scenario was implemented and evaluated in real classrooms during a semester-long course on digital skills with 45 high school students. Our results show that the learning scenario is appreciated by students and even though non-STEM students might require more onboarding to be fully engaged in the digitally mediated debate, students from both STEM and non-STEM classes learn effectively. We discuss shortcomings and future research avenues.

The Role of Gender in Students’ Privacy Concerns about Learning Analytics: Evidence from five countries

The protection of students’ privacy in learning analytics (LA) applications is critical for cultivating trust and effective implementations of LA in educational environments around the world. However, students’ privacy concerns and how they may vary along demographic dimensions that historically influence these concerns have yet to be studied in higher education. Gender differences, in particular, are known to be associated with people's information privacy concerns, including in educational settings. Building on an empirically validated model and survey instrument for student privacy concerns, their antecedents and their behavioral outcomes, we investigate the presence of gender differences in students’ privacy concerns about LA. We conducted a survey study of students in higher education across five countries (N = 762): Germany, South Korea, Spain, Sweden and the United States. Using multiple regression analysis, across all five countries, we find that female students have stronger trusting beliefs and they are more inclined to engage in self-disclosure behaviors compared to male students. However, at the country level, these gender differences are significant only in the German sample, for Bachelor's degree students, and for students between the ages of 18 and 24. Thus, national context, degree program, and age are important moderating factors for gender differences in student privacy concerns.

Embracing Trustworthiness and Authenticity in the Validation of Learning Analytics Systems

Learning analytics sits in the middle space between learning theory and data analytics. The inherent diversity of learning analytics manifests itself in an epistemology that strikes a balance between positivism and interpretivism, and knowledge that is sourced from theory and practice. In this paper, we argue that validation approaches for learning analytics systems should be cognisant of these diverse foundations. Through a systematic review of learning analytics validation research, we find that there is currently an over-reliance on positivistic validity criteria. Researchers tend to ignore interpretivistic criteria such as trustworthiness and authenticity. In the 38 papers we analysed, researchers covered positivistic validity criteria 221 times, whereas interpretivistic criteria were mentioned 37 times. We motivate that learning analytics can only move forward with holistic validation strategies that incorporate “thick descriptions” of educational experiences. We conclude by outlining a planned validation study using argument-based validation, which we believe will yield meaningful insights by considering a diverse spectrum of validity criteria.

Impact of window size on the generalizability of collaboration quality estimation models developed using Multimodal Learning Analytics

Multimodal Learning Analytics (MMLA) has been applied to collaborative learning, often to estimate collaboration quality with the use of multimodal data, which often have uneven time scales. The difference in time scales is usually handled by dividing and aggregating data using a fixed-size time window. So far, the current MMLA research lacks a systematic exploration of whether and how much window size affects the generalizability of collaboration quality estimation models. In this paper, we investigate the impact of different window sizes (e.g., 30 seconds, 60s, 90s, 120s, 180s, 240s) on the generalizability of classification models for collaboration quality and its underlying dimensions (e.g., argumentation). Our results from an MMLA study involving the use of audio and log data showed that a 60 seconds window size enabled the development of more generalizable models for collaboration quality (AUC 61%) and argumentation (AUC 64%). In contrast, for modeling dimensions focusing on coordination, interpersonal relationship, and joint information processing, a window size of 180 seconds led to better performance in terms of across-context generalizability (on average from 56% AUC to 63% AUC). These findings have implications for the eventual application of MMLA in authentic practice.

Exploring the Feedback Provision of Mentors and Clients for Teams in Work-Integrated Learning Environments

Industry supervisors play a pivotal role in ongoing learner support and guidance within a work-integrated learning context. Effective provisional feedback from industry supervisors in work-integrated learning environments is essential for increasing a team’s metacognitive awareness and ability to evaluate their performance. However, research that examines the usefulness and type of feedback from industry supervisors for teams remains limited. In this study, we investigate the quality of provisional feedback by comparing the teams’ helpfulness rating of the feedback from two types of industry supervisors (i.e., clients and mentors), based on the feedback type (task, process, regulatory and self-level oriented) using learning analytics. The results show that teams rated the perceived helpfulness scores of clients and mentors as very useful, with mentors providing slightly more helpful feedback. We also found that mentors provide more co-occurrences of feedback classifications than clients. The overall results show that teams perceive mentor feedback as more helpful than clients and that the mentor targets feedback that is more beneficial to the teams learning than the clients. Our findings can aid in developing guidelines that aim to validate and improve existing or new feedback quality frameworks by leveraging backward evaluation data.

What do students want to know while taking massive open online courses?: Examining massive open online course students’ needs based on online forum discussions from the Universal Design for Learning approach

We identified the nine most dominant massive open online course (MOOC) students’ needs by topic modeling and qualitative analysis of forum discussion posts (n = 3645) among students, staff, and instructors from 21 courses. We examined the implications of these needs using three main Universal Design for Learning (UDL) principles (representation, action and expression, and engagement). We then offered suggestions for what course providers can do to promote an equitable learning experience for MOOC students. The three suggestions are as follows: (1) providing tools such as a direct messaging application to encourage students’ socializing behaviors, (2) modifying course activities to promote more hands-on projects and sharing them, and (3) implementing a bidirectional channel, such as a natural language processing-based chatbot so that students can access useful information whenever they feel the need. We argue that it is critical to include minority students’ voices when examining needs in courses, and our methodology reflects this purpose. We also discuss how the UDL approach helped us recognize students’ needs, create more accessible MOOC learning experiences, and explore future research directions.

An Integrated Model of Feedback and Assessment: From fine grained to holistic programmatic review

Abstract: Research in learning analytics (LA) has long held a strong interest in improving student self-regulated learning and measuring the impact of feedback on student outcomes. Despite more than a decade of work in this space very little is known around the contextual factors that influence the topics and diversity of feedback and assessment a student encounters during their full program of study. This paper presents research investigating the institutional adoption of a personalized feedback tool. The reported findings illustrate an association between the topics of feedback, student performance, year level of the course and discipline. The results highlight the need for LA research to capture feedback, assessment and learning outcomes over an entire program of study. Herein we propose a more integrated model drawing on contemporary understandings of feedback with current research findings. The goal is to push LA towards addressing more complex teaching and learning processes from a systems lens. The model posed in this paper begins to illustrate where and how LA can address noted deficits in education practice to better understand how feedback and assessment are enacted by instructors and interpreted by students.

Detecting the Disengaged Reader - Using Scrolling Data to Predict Disengagement during Reading

When reading long and complex texts, students may disengage and miss out on relevant content. In order to prevent disengaged behavior or to counteract it by means of an intervention, it is ideally detected an early stage. In this paper, we present a method for early disengagement detection that relies only on the classification of scrolling data. The presented method transforms scrolling data into a time series representation, where each point of the series represents the vertical position of the viewport in the text document. This time series representation is then classified using time series classification algorithms. We evaluated the method on a dataset of 565 university students reading eight different texts. We compared the algorithm performance with different time series lengths, data sampling strategies, the texts that make up the training data, and classification algorithms. The method can classify disengagement early with up to 70% accuracy. However, we also observe differences in the performance depending on which of the texts are included in the training dataset. We discuss our results and propose several possible improvements to enhance the method.

Cluster-Based Performance of Student Dropout Prediction as a Solution for Large Scale Models in a Moodle LMS

Learning management systems provide a wide breadth of data waiting to be analyzed and utilized to enhance student and faculty experience in higher education. As universities struggle to support students’ engagement, success and retention, learning analytics is being used to build predictive models and develop dashboards to support learners and help them stay engaged, to help teachers identify students needing support, and to predict and prevent dropout. Learning with Big Data has its challenges, however: managing great quantities of data requires time and expertise. To predict students at risk, many institutions use machine learning algorithms with LMS data for a given course or type of course, but only a few are trying to make predictions for a large subset of courses. This begs the question: “How can student dropout be predicted on a very large set of courses in an institution Moodle LMS?” In this paper, we use automation to improve student dropout prediction for a very large subset of courses, by clustering them based on course design and similarity, then by automatically training, testing, and selecting machine learning algorithms for each cluster. We developed a promising methodology that outlines a basic framework that can be adjusted and optimized in many ways and that further studies can easily build on and improve.

Are We on the Same Page? Modeling Linguistic Synchrony and Math Literacy in Mathematical Discussions

Mathematical discussions have become a popular educational strategy to promote math literacy. While some studies have associated math literacy with linguistic factors such as verbal ability and phonological skills, no studies have examined the relationship between linguistic synchrony and math literacy. In this study, we modeled linguistic synchrony and students’ math literacy from 20,776 online mathematical discussion threads between students and facilitators. We conducted Cross-Recurrence Quantification Analysis (CRQA) to calculate linguistic synchrony within each thread. The statistical testing result comparing CRQA indices between high and low math literacy groups shows that students with high math literacy have a significantly higher Recurrence Rate (RR), Number of Recurrence Lines (NRLINE), and the average Length of lines (L), but lower Determinism (DET) and normalized Entropy (rENTR). This result implies that students with high math literacy are more likely to share common words with facilitators, but they would paraphrase them. On the other hand, students with low math literacy tend to repeat the exact same phrases from the facilitators. The findings provide a better understanding of mathematical discussions and can potentially guide teachers in promoting effective mathematical discussions.

Advancing leaner profiles with learning analytics: A scoping review of current trends and challenges

The term Learner Profile has proliferated over the years, and more recently, with the increased advocacy around personalising learning experiences. Learner profiles are at the center of personalised learning, and the characterisation of diversity in classrooms is made possible by profiling learners based on their strengths and weaknesses, backgrounds and other factors influencing learning. In this paper, we discuss three common approaches of profiling learners based on students’ cognitive knowledge, skills and competencies and behavioral patterns, all latter commonly used within Learning Analytics (LA). Although each approach has its strengths and merits, there are also several disadvantages that have impeded adoption at scale. We propose that the broader adoption of learner profiles can benefit from careful combination of the methods and practices of three primary approaches, allowing for scalable implementation of learner profiles across educational systems. In this regard, LA can leverage from other aligned domains to develop valid and rigorous measures of students' learning and propel learner profiles from education research to more mainstream educational practice. LA could provide the scope for monitoring and reporting beyond an individualised context and allow holistic evaluations of progress. There is promise in LA research to leverage the growing momentum surrounding learner profiles and make a substantial impact on the field's core aim - understanding and optimising learning as it occurs. 

Instructional Strategies and Student eTextbook Reading

Students’ reading is an essential part of learning in college courses. However, many instructors are concerned that students do not complete assigned readings, and multiple studies have found evidence to support this concern. A handful of studies suggest adopting strategies to address students’ lack of reading. This research examines various instructional strategies and student eTextbook reading behaviors validated by page view data. Survey responses related to use of instructional strategies were collected. A total of 86 instructors from four public universities participated. Of these participants, 59 submitted the assigned reading pages for their courses. This resulted in reading data from 3,714 students which were examined in this study. The findings indicated that students read about 37% of the assigned pages on any given day during the semester. Also, of the students that read, two-thirds made at least one annotation and students tend to re-read the pages they annotated. Most importantly, student reading in the courses where strategies were used was almost three times higher than in the courses where no strategies were implemented.

A Dashboard to Provide Instructors with Automated Feedback on Students’ Peer Review Comments

Writing-to-Learn (WTL) is an evidence-based instructional practice which can help students construct knowledge across many disciplines. Though it is known to be an effective practice, many instructors do not implement WTL in their courses due to time constraints and inability to provide students with personalized feedback. One way to address this is to include peer review, which allows students to receive feedback on their writing and benefits them as they act as reviewers. To further ease the implementation of peer review and provide instructors with feedback on their students’ work, we labeled students’ peer review comments across courses for type of feedback provided and trained a machine learning model to automatically classify those comments, improving upon models reported in prior work. We then created a dashboard which takes students’ comments, labels the comments using the model, and allows instructors to filter through their students’ comments based on how the model labels the comments. This dashboard can be used by instructors to monitor the peer review collaborations occurring in their courses. The dashboard will allow them to efficiently use information provided by peers to identify common issues in their students’ writing and better evaluate the quality of their students’ peer review.

Joint Choice Time: A Metric for Better Understanding Collaboration in Interactive Museum Exhibits

In this paper, we propose a new metric – Joint Choice Time (JCT) – to measure how and when visitors are collaborating around an interactive museum exhibit. This extends dwell time, one of the most commonly used metrics for museum engagement – which tends to be individual, and sacrifices insight into activity and learning details for measurement simplicity. We provide an exemplar of measuring JCT using a common “diversity metric” for collaborative choices and potential outcomes. We provide an implementable description of the metric, results from using the metric with our own data, and potential implications for designing museum exhibits and easily measuring social engagement. Here, we apply JCT to an interactive exhibit game called “Rainbow Agents” where museum visitors can play independently or work together to tend to a virtual garden using computer science concepts. Our data showed that diversity of meaningful choices positively correlated with both dwell time and diversity of positive and creative outcomes. JCT - as a productive as well as easy to access measure of social work - provides an example for learning analytics practitioners and researchers (especially in museums) to consider centering social engagement and work as a rich space for easily assessing effective learning experiences for museum visitors.

Informing Expert Feature Engineering through Automated Approaches: Implications for Coding Qualitative Classroom Video Data

While classroom video data are detailed sources for mining student learning insights, their complex and unstructured nature makes them less than straightforward for researchers to analyze. In this paper, we compared the differences between the processes of expert-informed manual feature engineering and automated feature engineering using positional data for predicting student group interaction in four middle school and high school mathematics classroom videos. Our results highlighted notable differences, including improved model accuracy for the combined (manual features + automated features) models compared to the only-manual-features models (mean AUC = .778 vs. .706) at the cost of feature interpretability, increased number of features for automated feature engineering (1523 vs. 178), and engineering approach (domain-agnostic in automated vs. domain-knowledge-informed in manual). We carried out feature importance analyses and discuss the implications of the results for potentially augmenting human perspectives about qualitatively coding classroom video data by confirming and expanding views on which body areas and characteristics may be relevant to the target interaction behavior. Lastly, we discuss our study’s limitations and future work.

Using submission log data to investigate novice programmers’ employment of debugging strategies

Debugging is a distinct subject in programming that is both comprehensive and challenging for novice programmers. However, instructors have limited opportunities to gain insights into the difficulties students encountered in isolated debugging processes. While qualitative studies have identified debugging strategies that novice programmers use and how they relate to theoretical debugging frameworks, limited larger scale quantitative analyses have been conducted to investigate how students’ debugging behaviors observed in log data align with the identified strategies and how they relate to successful debugging. In this study, we used submission log data to understand how the existing debugging strategies are employed by students in an introductory CS course when solving homework problems. We identified strategies from existing debugging literature that can be observed with trace data and extracted features to reveal how efficient debugging is associated with debugging strategy usage. Our findings both align with and contradict past assumptions from previous studies by suggesting that minor code edition can be a beneficial strategy and that width and depth aggregations of the same debugging behavior can reveal opposite effects on debugging efficiency.

Blockly-DS: Blocks Programming for Data Science with Visual, Statistical, Descriptive and Predictive Analysis

Interest in data science has been growing across industries - both STEM and non-STEM. Non-STEM students often have difficulties with programming and data analysis tools. These entry barriers can be minimized, and these concepts can be easily absorbed when using visual tools. Thus, for this specific audience, the use of visual tools has been essential for teaching data science. Several of these tools are available, but they all have limitations. This work presents Blockly-DS: a new tool capable of assisting in teaching data science to a non-STEM audience. The Blockly-DS tool is being tested in two Brazilian higher education institutions, one, IBMEC, a business undergraduate university, and the other, FIAP, a STEM school that offers an MBA as well as corporate and undergraduate courses. The preliminary results presented in this article refers to a validation with two groups of training sessions for junior financial analysts of a major Brazilian bank in partnership with FIAP.

Turn-taking analysis of small group collaboration in an engineering discussion classroom

This preliminary study focuses on using voice activity detection (VAD) algorithms to extract turn information of small group work detected from recorded individual audio stream data from undergraduate engineering discussion sections. Video data along with audio were manually coded for collaborative behavior of students and teacher-student interaction. We found that individual audio data can be used to obtain features that can describe group work in noisy classrooms. We observed patterns in student turn taking and talk duration during various sections of the classroom which matched with the video coded data. Results show that high quality individual audio data can be effective in describing collaborative processes that occurs in the classroom. Future directions on using prosodic features and implications on how we can conceptualize collaborative group work using audio data are discussed.

Predicting Students’ Algebra I Performance using Reinforcement Learning with Multi-Group Fairness

Numerous studies have successfully adopted learning analytics techniques such as machine learning (ML) to address educational issues. However, limited research has addressed the problem of algorithmic bias in ML. In the few attempts to develop strategies to concretely mitigate algorithmic bias in education, the focus has been on debiasing ML models with single group membership. This study aimed to propose an algorithmic strategy to mitigate bias in a multi-group context. The results showed that our proposed model could effectively reduce algorithmic bias in a multi-group setting while retaining competitive accuracy. The findings implied that there could be a paradigm shift from focusing on debiasing a single group to multiple groups in educational attempts on ML.

How Students’ Emotion and Motivation Changes After Viewing Dashboards with Varied Social Comparison Group: A Qualitative Study

The need to personalize learning analytics dashboards (LADs) is getting more recognized in learning analytics research community. In order to study the impact of these dashboards on learners, various types of prototypes have been designed and deployed in different settings. Applying Weiner’s attribution theory, our goal in this study was to understand the effect of dashboard information content on learners. We wanted to understand how elements of assignment grade, time spent on an assignment, assignment view, and proficiency in the dashboard affect students’ attribution of achievement and motivation for future work. We designed a qualitative study in which we analyzed participants’ responses and indicated behavioural changes after viewing the dashboard. Through in-depth interviews, we aimed to understand students’ interpretations of the designed dashboard, and to what extent social comparison impacts their judgments of learning. Students used multiple dimensions to attribute their success or failure to their ability and effort. Our results indicate that to maximize the benefits of dashboards as a vehicle for motivating change in students learning, the dashboard should promote effort in both personal and social comparison capacities.

Leveraging LMS Logs to Analyze Self-Regulated Learning Behaviors in a Maker-based Course

Existing learning analytics (LA) studies on self-regulated learning (SRL) have rarely focused on maker education that emphasizes student autonomy in their learning process. Towards using LA methods for generating evidence of SRL in maker-based courses, this study leverages logs of a learning management system (LMS) with its activity design aligned with the maker-based pedagogy. We explored frequencies and sequential patterns of students’ SRL behaviors as reflected in the LMS logs and their relations with learning performance. Adopting a mixed method approach, we collected and triangulated both quantitative (i.e., system logs, performance scores) and qualitative (i.e., student-written reflections) data sources from 104 students. Based on current LA-based SRL research, we developed an LMS log-based analytic framework to define the SRL phases and behaviors applicable to maker activities. Statistical, data mining, and qualitative analysis methods were conducted on 48,602 logged events and 131 excerpts extracted from student reflections. Results reveal that high-performing students demonstrated some SRL behaviors (e.g., Making Personal Plans, Evaluation) more frequently than their low-performing counterparts, yet the two groups showcased fairly similar sequences of SRL behaviors. Theoretical, methodological and pedagogical implications are drawn for LA-based SRL research and maker education.