Accepted Papers
**Full Conference Program will be posted soon**
*Please note these titles and abstracts may be subject to change as they are listed with pre-publication information. Any changes to titles and/or abstracts will be updated soon.
Full Research Papers
| Authors | Title | Abstract |
| Lisa-Maria Norz and Elske Ammenwerth | Who is Socially Present? Gendered Patterns in Online Higher Education Combining Manual Coding and Social Network Analysis for Equity in Learning | Advancing equity in online learning requires methods that capture both the content of interaction and its structural patterns. Social presence, rooted in the Community of Inquiry (CoI) framework, is central for meaningful online learning, yet little is known about its gender-related dynamics in asynchronous environments. This study investigates how social presence behaviors vary by gender among 49 graduate students (24 female, 25 male) in three online groups of a Health Information program. Overall, 3,546 postings were coded using a validated German CoI scheme, while six social network analysis indicators (Ties, Density, In-Closeness, Out-Closeness, nBroke and Efficiency) mapped interaction structures across five weeks. Gender differences were assessed with the Mann-Whitney U test. Results show that female students contributed more frequently (M=75.3 vs. 56.2 postings) and displayed higher levels of Open Communication (p < .001) and Group Cohesion (p = .023), but not of Affective Expression (p = .093). Temporal analyses revealed that gender effects became more pronounced over time, while network indicators showed no consistent patterns. These findings underscore the value of combining content-based coding and structural measures to advance equity-oriented learning analytics. This approach highlights design opportunities for scalable, gender-sensitive learning environments. Replication with larger, more diverse samples is recommended. |
| Brody Hannan and Christian Bokhove | Who Benefits from Using Intelligent Tutoring Systems, and How Much Use is Optimal? Examining the Effects of School Socioeconomic Status and Intelligent Tutoring System Use on Student Mathematics Learning Outcomes in Australian High Schools | Intelligent Tutoring Systems (ITS) show promise for improving student outcomes, but their effectiveness may vary by school socioeconomic status (SES). This study asks who benefits from ITS platforms and how much use is optimal. Using learning analytics from an Australian mathematics ITS with 7,147 Year 7 students across 500 classes and 127 schools, we examined how internal ITS achievement scores relate to external standardised assessments (NAPLAN), and how these relationships shift with school SES and ITS usage. We first tested a Conditional Mediation Structural Equation Model of direct, indirect, and interaction effects among SES, ITS usage, and class-level outcomes, then estimated a Mixture Model with piecewise splines across known-class SES quartiles to capture non-linear usage effects. Across both analyses, ITS achievement scores positively predicted NAPLAN outcomes, but the strength of this relationship depended on school SES and usage intensity. Specifically, lower SES schools reached a point of diminishing returns in their usage of the ITS platform, far sooner than their high SES counterparts. These findings highlight two key contributions: first, that the benefits of ITS are not uniform, but depend on who is using the platform; and second, that there are practical limits to how much ITS use is beneficial. |
| Jiayi Zhang, Conrad Borchers, Canwen Wang, Vishal Kumar, Leah Teffera, Bruce M. McLaren and Ryan S. Baker | Understanding Gaming the System through Analyzing Self-Regulated Learning in Think-Aloud Protocols | In digital learning systems, gaming the system refers to occasions when students attempt to succeed in an educational task by systematically taking advantage of system features rather than engaging meaningfully with the content. Often viewed as a form of behavioral disengagement, gaming the system is negatively associated with short- and long-term learning outcomes. However, little research has explored this phenomenon beyond its behavioral representation, leaving questions such as whether students are cognitively disengaged, or whether they engage in different SRL strategies when gaming, largely unanswered. This study employs a mixed-methods approach to examine students’ cognitive engagement and SRL processes during gaming versus non-gaming periods, based on utterance length and SRL codes inferred from think-aloud protocols. We found that gaming does not simply reflect a lack of cognitive effort; during gaming students often produced longer utterances, were more likely to engage in processing information and realizing errors, but less likely to engage in planning, and exhibited reactive rather than proactive strategies. These findings provide empirical evidence supporting the interpretation that gaming may represent a maladaptive form of SRL. With this understanding, future work can address gaming and its negative impacts by designing systems that targets maladaptive self-regulation, to promote better learning. |
| Hibiki Ito, Chia-Yu Hsu and Hiroaki Ogata | Cyclic Adaptive Private Synthesis for Sharing Real-World Data in Education | The rapid adoption of digital technologies has greatly increased the volume of real-world data (RWD) in education. While these data offer significant opportunities for advancing learning analytics (LA), secondary use for research is constrained by privacy concerns. Differentially private synthetic data generation is regarded as the gold-standard approach to sharing sensitive data, yet studies on the private synthesis of educational data remain very scarce and rely predominantly on large, low-dimensional open datasets. Educational RWD, however, are typically high-dimensional and small in sample size, leaving the potential of private synthesis underexplored. Moreover, because educational practice is inherently iterative, data sharing is continual rather than one-off, making a traditional one-shot synthesis approach suboptimal. To address these challenges, we propose the Cyclic Adaptive Private Synthesis (CAPS) framework and evaluate it on authentic RWD. By iteratively sharing RWD, CAPS not only fosters open science, but also offers rich opportunities of design-based research (DBR), thereby amplifying the impact of LA. Our case study using actual RWD demonstrates that CAPS outperforms a one-shot baseline while highlighting challenges that warrant further investigation. Overall, this work offers a crucial first step towards privacy-preserving sharing of educational RWD and expands the possibilities for open science and DBR in LA. |
| Yildiz Uzun, Andrea Gauthier and Mutlu Cukurova | What Students Ask, How a Generative AI Assistant Responds: Exploring Dialogues on Learning Analytics Feedback | Learning analytics dashboards (LADs) aim to support students’ regulation of learning by translating complex data into feedback. Yet students, especially those with lower self-regulated learning (SRL) competence, often struggle to engage with and interpret analytics feedback. Conversational generative artificial intelligence (GenAI) assistants have shown potential to scaffold this process through real-time, personalised, dialogue-based support. Further advancing this potential, we explored authentic dialogues between 34-students and GenAI assistant integrated into LAD during a 10-week semester. The analysis focused on how students with different SRL levels posed queries, the relevance and quality of the assistant’s answers, and how students perceived its role in their learning. Findings revealed distinct query patterns. While low SRL students sought clarification and reassurance, high SRL students queried technical aspects and requested personalised strategies. The assistant provided clear and reliable explanations but limited in personalisation, handling emotionally charged queries, and integrating multiple data points for tailored response. Findings further extend that GenAI interventions can be especially valuable for low SRL students, offering scaffolding that supports engagement with feedback and narrows gaps with their higher SRL peers. At the same time, students’ reflections underscored the importance of trust, need for greater adaptivity, context-awareness, and technical refinement in future systems. |
| Hunter McNichols, Fareya Ikram and Andrew Lan | The StudyChat Dataset: Analyzing Student Dialogues With ChatGPT in an Artificial Intelligence Course | The widespread availability of large language models (LLMs), such as ChatGPT, has significantly impacted education, raising both opportunities and challenges. Students can frequently interact with LLM-powered, interactive learning tools, but their usage patterns need to be monitored and understood. We introduce StudyChat, a publicly available dataset capturing real-world student interactions with an LLM-powered tutoring chatbot in a semester-long, university-level artificial intelligence (AI) course. We deploy a web application that replicates ChatGPT's core functionalities, and use it to log student interactions with the LLM while working on programming assignments. We collect 16,851 interactions, which we annotate using a dialogue act labeling schema inspired by observed interaction patterns and prior research. We analyze these interactions, highlight usage trends, and analyze how specific student behavior correlates with their course outcome. We find that students who prompt LLMs for conceptual understanding and coding help tend to perform better on assignments and exams. Moreover, students who use LLMs to write reports and circumvent assignment learning objectives have lower outcomes on exams than others. StudyChat serves as a shared resource to facilitate further research on the evolving role of LLMs in education. |
| Martin Hlosta, Ivan Moser, Geoffray Bonnin and Per Bergamin | Do High-SES Students Better Overcome Early Difficulties? Testing the Compensatory Advantage Hypothesis in Online University Courses | Equality in learning outcomes between students from different socio-economic status (SES) backgrounds is a long-standing issue. Although the problem is widespread, not all factors contributing to the gap are known. The compensatory advantage hypothesis posits that benefits of a higher-SES background are greater for students who face early academic difficulties. In other words, the effect of SES is amplified when encountering a challenge. This study examines how poor first assignment outcomes interact with SES to affect success in 22 online university courses. The results show that the effect of coming from a higher-SES neighbourhood on course success is greater for students who do not submit the first assignment than for those who perform well on it, in line with the compensatory advantage hypothesis. More precisely, students from mid-ranked and high-SES areas were better able to compensate for missing their first assignment than their low-SES counterparts. Despite robust results, the interaction effect varies considerably across courses, suggesting contextual course factors might influence the strength, though the factors we analysed did not account for this variation. The implications of these findings for both research and potential strategies to reduce educational inequalities are discussed, emphasizing their significance for theoretical understanding and practical applications. |
| Longwei Cong, Leon Hammerla, Sonja Hahn, Sebastian Gombert, Hendrik Drachsler and Ulf Kröhne | Automatic Short Answer Grading with LLMs: From Memorization to Reasoning | Short-answer questions provide valuable insights into students’ understanding and cognitive processes for learning analytics. However, they are difficult to grade automatically as they require a high level of language comprehension. Automatic Short Answer Grading (ASAG) is therefore essential in large-scale educational settings. Recent work has applied encode-only pre-trained language models (PLMs), such as BERT, and generative large language models (LLMs) to ASAG. Although fine-tuned BERT-based models currently produce state-of-the-art results, they depend on substantial annotated datasets, which are frequently expensive and insufficient. This paper examines the performance of fine-tuning of several PLMs and LLMs for different dataset sizes and compares the results to those of prompt-based approaches. General-purpose and domain-specific models were fine-tuned on datasets ranging from 800 to 26,674 student responses. Different prompt engineering strategies were tested including rubric-based prompts. Our results demonstrate that fine-tuned LLMs and rubric-based prompting can match or exceed the performance of BERT-based models. Rubric-based prompts with open-source model deliver comparable results without the need for annotation data or hardware-intensive training, while also mitigating data protection concerns. This work provides empirical evidence of the role of LLMs in ASAG and paves the way for future research into resource-efficient, interpretable and reasoning-driven grading. |
| Andres Felipe Zambrano, Jiayi Zhang, Maciej Pankiewicz and Ryan S. Baker | Practice as the Key to Success: Understanding the Role of Prior Knowledge, Affective States and Learning Resources in Computer Science Education | Introductory programming courses (CS1) bring together students with diverse prior experiences, which shape their use of learning resources, emotional responses, and academic performance. This study employs structural equation modeling, self-reported affective data, and multimodal interaction logs to investigate how prior programming knowledge affects affect, resource utilization, and outcomes in a CS1 course that features an automated assessment tool (AAT), instructional videos, and worked examples. Students who persisted with practice and advanced beyond basic tasks achieved the strongest outcomes, though they followed different emotional pathways. By contrast, relying solely on videos or worked examples did not significantly lead to success, and disengagement was generally tied to weaker performance. Novices often reported confusion and frustration; while these emotions sometimes hindered learning, they also drove deeper engagement with the AAT, improving outcomes for those who persisted. Experienced students, however, more often reported boredom, which consistently reduced practice and led to poorer outcomes. These findings underscore the need for adaptive support that balances challenge with guidance to sustain engagement and promote success for diverse learners in CS1 |
| Cleon Xavier, Luiz Rodrigues, Ana Valdo, Ariadne Carvalho, Gabriela Matos, Lucas Kalinke, Ramon Vilela, Erika Resende, Thais Moraes, Matheus Silva, Nara Nobre-Silva, Newarney Costa, Fabiola Ribeiro, Anderson Pinheiro, Dragan Gasevic and Rafael Mello | From Solo Graders to Assisted Annotation: Integrating LLM Suggestions into the Educational Data Creation Pipeline | The creation of annotated datasets is important for Automatic Short Answer Grading (ASAG), but establishing a reliable gold standard remains a challenge, often complicated by discrepancies among human annotators. Within Learning Analytics (LA), this challenge is particularly pressing because dataset reliability underpins the validity of analytic models and the feedback they provide to learners and educators. To address this, this study investigated the dynamics between human raters and Large Language Models (LLMs) in the assessment cycle, comparing approaches ranging from human-supported annotation to automated scoring. The study was conducted with students’ Chemistry responses from a Brazilian high school. Responses were graded by teachers under scenarios with and without the aid of score suggestions generated by an LLM (GPT-4o-mini). Results showed that human with LLM suggestion (Human-AI interaction) yielded outcomes significantly closer to the gold standard (Kappa = 0.91) than approaches based solely on humans or automated systems. Additionally, LLM suggestions increased agreement among human graders. Our findings suggest that Human–AI process is not only a technical improvement but also a methodological contribution to LA. This offers a potential pathway for constructing more reliable datasets that strengthen the validity of subsequent analytics and the support they provide in educational practice. |
| Qinyi Liu, Lin Li, Valdemar Švábenský, Conrad Borchers and Mohammad Khalil | Measuring the Impact of Student Gaming Behaviors on Learner Modeling | The expansion of large-scale online education platforms has made vast amounts of student interaction data available for further processing by systems like adaptive learning systems (ALS) and knowledge tracing (KT) models. KT models aim to estimate students’ concept mastery from interaction data, but their performance is highly sensitive to input data quality. Gaming behaviors, such as excessive hint use, misrepresents students’ knowledge and undermines model reliability. However, systematic investigations of how different types of gaming behaviors affect KT remain scarce, while existing studies rely on costly manual analysis that does not capture behavioral diversity. In this study, we conceptualize gaming behaviors as a form of data poisoning. We design Data Poisoning Attacks (DPAs) to simulate diverse gaming patterns and systematically evaluate their impact on KT models. Moreover, drawing on advances in DPA detection, we explore unsupervised approaches that can enhance the generalizability of gaming behavior detection. Our findings provide new insights into the vulnerabilities of KT models and highlight the potential of adversarial methods for improving the robustness of learning analytics systems. |
| Eng Lieh Ouh, Kar Way Tan, Siaw Ling Lo and Yuhao Zhang | Detecting Doubt in Reflective Learning: A Learning Analytics Study with Large and Small Language Models | Reflective learning enhances understanding, especially when instructors promptly address difficulties raised in student reflections. Automated doubt detection can reduce time for instructors, yet existing classification approaches take substantial time for manual annotation and model training. This paper investigates whether large and small language models (LLMs, SLMs) can automate doubt detection without time-consuming training. Using a dataset of anonymized student reflections, we evaluate zero-shot, few-shot prompting, and multi-step reasoning against prior supervised classification baselines. We show that LLMs (GPT-4o, Claude-4, Gemini-2.5) surpass earlier F1 scores without prompting, while prompting further improves their performance. However, using proprietary LLMs can raise cost and privacy concerns. We also show that selected SLMs (Mistral, Qwen) beat baselines while addressing these concerns. We extend the analysis with a category-level error study, showing that explicit doubts are detected more reliably, while tentative doubts (softened by cautious language), learning challenge (arising from difficulties in applying concepts), and masked doubts (concealed by positive or polite phrasing) are missed more often. These findings highlight both the promise and limitations of language models for doubt detection and the need to ensure that cautious or polite learners who may not express their doubts explicitly are recognized and supported in learning analytics systems. |
| Daniel Rosa, Andreza Falcão, Jamilla Lobo, Everton Souza, Moésio Wenceslau Silva Filho, Dragan Gasevic, Rafael Ferreira Mello and Luiz Rodrigues | Automated Assessment of Handwritten Math Problems: A Comparison of Prompting Strategies for Open and Closed-source LLMs | Assessing handwritten mathematical solutions is essential for identifying students' weaknesses and fostering personalized learning. However, scalabiling such assessment remains challenging for Learning Analytics, which has traditionally focused on digital or typed data. The current study investigated the potential of Large Language Models (LLMs) to automating the assessment of handwritten mathematical solutions and explore how they can be incorporated into large scale learning analytics pipelines. We curated 300 student solution images, annotated them using to a taxonomy of math error types, and compared open-source (Qwen2.5-7B and Gemma3 12B-IT) and closed-source (Gemini 2.0 Flash and GPT-4) LLMs. Two prompting strategy were tested: from adapted from related work and one tailed to the taxonomy of math error types using established prompt design principles. The results revealed that LLMs, particularly Gemini, achieved strong to moderate performance in diagnosing and classifying student errors, while exposing recurring model specific errors. These findings highlight both the promise and limitations of LLMs for integrating handwritten work in LA and, recommend that learning analytics practitioners and researchers combine careful model model selection, principled prompt design, and error-level analysis to develop AI-powered LA systems that are accurate, equitable, and pedagogically actionable. |
| Andres Felipe Zambrano, Jaclyn Ocumpaugh, Ryan S. Baker and Jessica Vandenberg | Emotions in Action: How Students’ Regulatory Responses Shape Learning | Emotions play a central role in shaping learning within digital environments. Although their effects may depend on how students translate emotional experiences into concrete behaviors, the links between these dimensions remain underexplored. This study investigates the most common behaviors during episodes of boredom, confusion, frustration, and engaged concentration in an educational game, as well as associations with situational interest, self-efficacy, prior knowledge, and learning gains, using interaction logs and sensor-free affect detectors. Results show that boredom is linked to off-task roaming, both consistently associated with lower motivation and learning. In contrast, behaviors during engaged concentration, frustration, and especially confusion vary widely, shaped by motivational traits and prior knowledge and offering diverse associations with learning. Concrete regulatory responses in these states—such as systematizing findings with in-game tools, skimming domain content to resolve doubts, or testing hypotheses—are positively associated with learning and motivation, reflecting students’ ability to regulate emotions and address cognitive challenges. However, less constructive responses, such as aimless wandering, were tied to lower knowledge and motivation, underscoring the need for additional support. These findings extend existing affective theory by underscoring the importance of considering the behavioral dimension when analyzing students’ emotions in digital learning environments. |
| Marshall An, Mahboobeh Mehrvarz, John Stamper and Bruce M. McLaren | Revisiting the Hint Button: Consistent Negative Associations Between Unproductive Hint Use and Learning Outcomes in Intelligent Tutoring Systems | Intelligent Tutoring Systems (ITS) commonly provide on-demand multi-level hints, culminating in a bottom-out hint that reveals the answer. Although hints are theoretically intended to support learning, the relationship between hint usage and learning outcomes is complex. Our multi-semester analysis of K–12 mathematics ITS data from 999 students investigates how students use hints and the correlation between hint usage patterns and learning. We identified two unproductive hint use behaviors: requesting hints prematurely and reading them superficially. Both behaviors consistently and significantly correlated with reduced learning gains across multiple semesters. Notably, these behaviors were more prevalent among students with lower prior knowledge. Since these behaviors are consistently correlated with poor outcomes even after accounting for prior knowledge, they may serve as diagnostic indicators for at-risk students. These results suggest the need for a strategic redesign to align hint functionality with their pedagogical purpose, ensuring they function as true scaffolds for learning rather than bypasses that circumvent it. |
| Conrad Borchers, Ashish Gurung, Qinyi Liu, Danielle R Thomas, Mohammad Khalil and Kenneth R. Koedinger | Brief but Impactful: How Human Tutoring Interactions Shape Engagement in Online Learning | Learning analytics can guide human tutors to efficiently address motivational barriers to learning that AI systems struggle to support. Students become more engaged when they receive human attention. However, what occurs during short interventions, and when are they most effective? We align student–tutor dialog transcripts with \emph{MATHia} tutoring system log data to study brief human-tutor interactions on Zoom drawn from 2,057 hours of 191 middle school students' classroom math practice. Mixed-effect models reveal that engagement, measured as successful solution steps per minute, is higher during a human-tutor visit and remains elevated afterward. Visit length exhibits diminishing returns: engagement rises during and shortly after visits, irrespective of visit length. Timing also matters: later visits yield larger immediate lifts than earlier ones, although an early visit is still important to counteract engagement decline. We create analytics that identify which tutor-student dialogs raise engagement the most. Qualitative analysis reveals that interactions with concrete, stepwise scaffolding with explicit work organization elevate engagement most strongly. We discuss implications for resource-constrained tutoring, prioritizing several brief, well-timed check-ins by a human tutor while ensuring at least one early contact. Our analytics can guide the prioritization of students for support and surface effective tutor moves in real-time. |
| Thorleif Harder, Monique Janneck and Amir Madany Mamlouk | Understanding and Supporting STEM Students Through Digital Nudging: Evidence-Based Requirements for Learning Analytics Interventions | STEM students in LMS-supported courses often face self-regulation challenges and avoid seeking help online despite high motivation. Existing learning analytics (LA) interventions frequently fail to address these issues, as they are not grounded in students’ behavioral and psychological barriers. This paper presents a sequential mixed-methods study (N=106, n=14) investigating self-regulation, motivation, and support behavior alongside academic stress areas (self-study, exam preparation, group work, instructor interaction) across four STEM courses. Findings reveal a disconnect between strategic knowledge and implementation, driven by procrastination and fear of appearing incompetent in public forums. Based on these results, we derive six design principles for LA-based digital nudging interventions, such as structured re-entry points after inactivity and just-in-time reminders keyed to temporal risk. Each principle is aligned with specific LMS log data and behavioral mechanisms. The resulting framework translates empirically identified student challenges into log-data-based design requirements for scalable and ethically grounded nudging interventions in LMS. |
| Maximilian Anzinger, Annika Hecking-Veltman, Maxie Bichmann and Stephan Krusche | Investigating Student Interaction with Competency-Based CS Education | Background: Limited empirical evidence exists regarding student experiences with learning analytics tools for CBE in large-scale implementations, particularly regarding how competency transparency supports self-regulated learning in CS education. Objective: This study investigates associations between voluntary student interaction with [Tool], a learning analytics system for CBE, and learning outcomes in an authentic large-scale CS course using mixed-methods analysis. Methods: We employed a mixed-methods observational design with voluntary [Tool] adoption in a CS course (N=1,278). Students were categorized post-hoc into high-interaction (n=281), low-interaction (n=329), and no-interaction (n=668) groups based on competency feature usage. Pre-intervention (n=96) and post-intervention (n=61) surveys measured confidence, usability perceptions, and CBE attitudes. Post-hoc classification creates selection bias limiting causal inference. Results: Students interacting with [Tool] showed significantly higher competency mastery (High: 41.6%, Low: 37.2% vs. No interaction: 28.9%, p<.001, g=0.32-1.07) and exam scores (41.3% vs. 33.9%, p<.001, g=0.26). Passing rates improved for interaction groups (55.5% High, 52.8% Low vs. 45.7% No interaction). Surveys revealed positive usability perceptions and increased subject confidence among users. Conclusion: This study provides empirical evidence of associations between learning analytics tool usage and student performance in large-scale CS education, demonstrating mixed-methods value for triangulating behavioral and perceptual data in authentic educational technology implementations. |
| Zijian Li, Luzhen Tang, Mengyu Xia, Xinyu Li, Naping Chen, Dragan Gašević and Yizhou Fan | When LLMs Fall Short in Deductive Coding: Model Comparisons and Human–AI Collaboration Workflow Design | With generative artificial intelligence driving the growth of dialogic data in education, automated coding is a promising direction for learning analytics to improve efficiency. This surge highlights the need to understand the nuances of student-AI interactions, especially those rare yet crucial. However, automated coding may struggle to capture these rare codes due to imbalanced data, while human coding remains time-consuming and labour-intensive. The current study examined the potential of large language models (LLMs) to approximate or replace humans in deductive, theory-driven coding, while also exploring how human–AI collaboration might support such coding tasks at scale. We compared the coding performance of small transformer classifiers (e.g., BERT) and LLMs in two datasets, with particular attention to imbalanced head–tail distributions in dialogue codes. Our results showed that LLMs did not outperform BERT-based models and exhibited systematic errors and biases in deductive coding tasks. We designed and evaluated a human–AI collaborative workflow that improved coding efficiency while maintaining coding reliability. Our findings reveal both the limitations of LLMs -- especially their difficulties with semantic similarity and theoretical interpretations -- and the indispensable role of human judgment, while demonstrating the practical promise of human–AI collaborative workflows for coding. |
| Wicaksono Febriantoro, Qi Zhou, Wannapon Suraworachet, Sahan Bulathwela, Andrea Gauthier, Eva Millan and Mutlu Cukurova | Examining Student Interactions with a Pedagogical AI-Assistant for Essay Writing and their Impact on Students’ Writing Quality | The dynamic nature of interactions between students and GenAI, as well as their relationship to writing quality, remains underexplored. While most research has examined how general-purpose GenAI can support writing, fewer studies have investigated how students interact with pedagogically designed systems across different phases of the writing process. To address this gap, we evaluated a GenAI-driven essay-writing assistant (EWA) designed to support higher education students in argumentative writing. Drawing on 1,282 interaction logs from 32 undergraduates during a two-hour writing session, Sequential Pattern Mining and K-Means clustering were used to identify behavioral patterns. Two clusters emerged: Cluster 1 emphasized outline planning and essay structure, while Cluster 2 focused on content development. A Mann-Whitney U test revealed a moderate effect size (r = 0.36) in the essay Organization dimension, with Cluster 1 showing higher scores. Qualitative analysis indicated that students with better performance actively wrote and shared essay sections with EWA for feedback, rather than interacted passively by asking questions. These findings suggest implications for teaching and system design. Teachers can encourage active engagement, while future EWAs may integrate automatic labeling and monitoring to prompt students to move from questioning to writing, enabling fuller benefits from GenAI-supported learning. |
| Jianjun Xiao, Cixiao Wang and Wenmei Zhang | Modeling Collaborative Problem Solving Dynamics from Group Discourse: A Text-Mining Approach with Synergy Degree Model | Measuring collaborative problem solving (CPS) synergy remains challenging in learning analytics, as classical manual coding can not capture emergent system-level dynamics. This study introduces a computational framework that integrates automated discourse analysis with the Synergy Degree Model (SDM) to quantify CPS synergy from group communication. Data were collected from 52 learners in 12 groups during a 5-week cMOOC activity. Nine classification models were applied to automatically identify ten CPS behaviors across four interaction levels: operation, wayfinding, sense-making, and creation. While BERT achieved the highest accuracy (0.507±0.029), GPT models demonstrated superior precision (0.539±0.004) suitable for human–AI collaborative coding. Within the SDM framework, each interaction level was treated as a subsystem to compute group-level order parameters and derive synergy degrees. Permutation tests showed automated measures preserve construct validity, despite systematic biases at the subsystem level. Statistical analyses revealed significant task-type differences: survey study groups exhibited higher creation-order than mode study groups (U=68.5, p=.016), suggesting “controlled disorder” may benefit complex problem solving. Importantly, synergy degree distinguished collaborative quality (F=3.44, p=.023), ranging from excellent (0.084±0.192) to failing groups (-0.215±0.316). Findings establish synergy degree as a sensitive indicator of collaboration and demonstrate the feasibility of scaling fine-grained CPS analytics through AI-in-the-loop approaches. |
| Mohammed Saqr, Sonsoles López-Pernas, Santtu Tikka and Markus Spitzer | Early Warning Signals Appear Long Before Dropping Out: An Idiographic Approach Grounded in Complex Dynamic Systems Theory | The ability to sustain engagement and recover from setbacks (i.e., resilience)—is fundamental for learning. When resilience weakens, students are at risk of disengagement and may drop out and miss out on opportunities that could transform their lives and communities. Therefore, predicting disengagement long before it happens during the window of hope is crucial. In this article, we test whether statistical early warning signals of resilience loss, grounded in the concept of critical slowing down (CSD), can forecast disengagement before dropping out. CSD has been widely observed across ecological, climate, and neural systems, where it precedes tipping points into catastrophic failure (dropping out in our case). Using 1.67 million practice attempts from 9,401 students who used a digital math learning environment, we computed CSD indicators: autocorrelation, return rate, variance, skewness, kurtosis, and coefficient of variation. We found that 88.2% of students exhibited CSD-consistent signals prior to disengagement, with warnings clustering late in activity sequences but emerging before practice ceased. These results provide the first evidence of CSD in education, suggesting that universal resilience dynamics not only govern natural and social systems but also human learning, offering a new scientific basis for detecting vulnerability and supporting learners across different applications and contexts. |
| Kester Yew Chong Wong, Shihui Feng, Sahan Bulathwela and Mutlu Cukurova | Scaffolding Reshapes Dialogic Engagement in Collaborative Problem Solving: Comparative Analysis of Two Approaches | Supporting learners during Collaborative Problem Solving (CPS) is a necessity. Existing studies have compared scaffolds with maximal and minimal instructional support by studying their effects on learning and behaviour. However, our understanding of how such scaffolds could differently shape the distribution of individual dialogic engagement and behaviours across different CPS phases remains limited. This study applied Heterogeneous Interaction Network Analysis (HINA) and Sequential Pattern Mining (SPM) to uncover the structural effects of scaffolding on different phases of the CPS process among K-12 students in authentic educational settings. Students with a maximal scaffold demonstrated higher dialogic engagement across more phases than those with a minimal scaffold. However, they were extensively demonstrating scripting behaviours across the phases, evidencing the presence of overscripting. Although students with the minimal scaffold demonstrated more problem solving behaviours and fewer scripting behaviours across the phases, they repeated particular behaviours in multiple phases and progressed more to socialising behaviours. In both scaffold conditions, problem solving behaviours rarely progressed to other problem solving behaviours. The paper discusses the implications of these findings for scaffold design and teaching practice of CPS, and highlights the distinct yet complementary value of HINA and SPM approaches to investigate students' learning processes during CPS. |
| Zoheb Borbora, John Bielinski and Bruce Bray | A Theory Driven Supervised Learning Approach For Math Skill Proficiency Prediction Using Assessment and Practice Data | Skill proficiency prediction is an important problem in the personalized learning space. We describe a machine learning approach to skill proficiency prediction that is highly interpretable, efficient, and can be used to incorporate data from assessment, practice and potentially. other sources. We formulated the approach as a theory driven supervised learning problem - specifically, we used concepts from Item Response Theory to construct features that provide model interpretability. Results indicate that the most predictive features are those that are estimates of student ability, skill difficulty and ability-difficulty difference, thus aligning well with Item Response Theory. Model diagnostics and error analysis provided several insights into the problem that align with learning theories. We incorporated data from both assessment and practice products in the prediction model. Results indicate that the model performs better when there is both prior assessment and practice student data as opposed to when there is just assessment or practice data. Finally, we proposed an approach that recommends the instructional category that best represents each student’s current level of skill development as determined by the skill proficiency prediction value. The proposed approach provides confidence metrics for the recommendations and optimizes for percentage of students placed in the categories. |
| Conrad Borchers, Hannah Deininger and Zachary A. Pardos | Toward Trait-Aware Learning Analytics | Learning analytics (LA) draws from the learning sciences to interpret learner behavior and inform system design. Yet, past personalization remains largely at the content or performance level (during learner-system interactions), overlooking relatively stable individual differences such as personality (unfolding over long-term learning trajectories such as college degrees). The latter could bring underappreciated benefits to the design, implementation, and impact of LA. In this position paper, we conduct an ad hoc literature review and argue for an expanded framing of LA that centers on learner traits as key to both interpreting and designing close-the-loop experiments in LA. We show that personality traits are \textit{relevant} to LA’s central outcomes (e.g., engagement and achievement) and \textit{conducive} to action, as their established ties to human-computer interaction (HCI) inform how systems time, frame, and personalize support. Drawing inspiration from HCI, where psychometrics inform personalization strategies, we propose that LA can evolve by treating traits not only as predictive features but as design resources and moderators of analytics efficacy. In line with past position papers published at LAK, we present a research agenda grounded in the LA cycle and discuss methodological and ethical challenges. |
| Jiameng Wei, Dinh Dang, Kaixun Yang, Emily Stokes, Amna Mazeh, Angelina Lim, David Wei Dai, Joel Moore, Yizhou Fan, Danijela Gasevic, Dragan Gasevic and Guanliang Chen | Uncovering Students’ Inquiry Patterns in GenAI-Supported Clinical Practice: An Integration of Epistemic Network Analysis and Sequential Pattern Mining | Assessment of medication history-taking has traditionally relied on human observation, limiting scalability and detailed performance data. While Generative AI (GenAI) platforms enable extensive data collection and learning analytics provide powerful methods for analyzing educational traces, these approaches remain largely underexplored in pharmacy clinical training. This study addresses this gap by applying learning analytics to understand how students develop clinical communication competencies with GenAI-powered virtual patients, given today's diversity of student cohorts, varying language backgrounds, and limited opportunities for individualized feedback. We analyzed 323 students' interaction logs across Australian and Malaysian institutions, comprising 50,871 coded utterances from 1,487 student-GenAI dialogues. Combining Epistemic Network Analysis to model inquiry co-occurrences with Sequential Pattern Mining to identify temporal sequences, we found that high performers demonstrated strategic deployment of information recognition behaviors. Specifically, high performers centered inquiry on recognizing clinically relevant information, integrating rapport-building and structural organization, while low performers remained in routine question-verification loops. Demographic factors including first-language background, prior pharmacy work experience, and institutional context also shaped inquiry patterns. These findings reveal important inquiry patterns that may indicate clinical reasoning development in GenAI-assisted contexts, providing methodological insights for health professions education assessment and informing adaptive GenAI system design that supports diverse learning pathways. |
| Valdemar Švábenský, Conrad Borchers, Elvin Fortuna, Elizabeth B. Cloude and Dragan Gašević | Mapping 15 Years of Learning Analytics Research: Global Trends, Topics, and Collaboration | The learning analytics (LA) community has recently marked two important milestones: celebrating the 15th LAK conference and updating the 2011 definition of LA to reflect the 15 years of changes in the discipline. Yet, despite this growth, little is known about how research topics, funding, and collaboration, as well as the relationships among them, have developed within the community over time. This study addressed this gap by analyzing all 936 full and short papers published at LAK over 15 years using unsupervised machine learning, natural language processing, and network analytics. The analysis revealed a stable core of prolific authors alongside high turnover of newcomers, systematic links between funding sources and research directions, and six enduring topical centers that remain globally shared but vary in prominence across countries. These findings show that LA has matured into a coherent yet dynamic field. At the same time, they highlight key challenges for the next stage: widening participation, reducing dependency on a narrow set of funders, and ensuring that emerging research trajectories remain responsive to educational practice and societal needs. |
| Sebastian Gombert, Zhifan Sun, Fabian Zehner, Jannik Lossjew, Tobias Wyrwich, Berrit Katharina Czinczel, David Bednorz, Marcus Kubsch, Daniele Di Mitri, Knut Neumann and Hendrik Drachsler | Are rubrics all you need? Towards rubric-based automatic short answer scoring via guided rubric-answer alignment | In educational assessment, rubrics are a key tool because they define clear criteria for evaluating learner responses and specify the evidence required. Yet, in research on automatic short-answer scoring, rubrics are seldom employed as explicit scoring references and, when they are, they are typically treated only as supplementary inputs. This study introduces a task definition for rubric-based automatic short-answer scoring, a version of short-answer scoring that uses scoring rubrics as scoring references. As an approach for implementing rubric-based short answer scoring, we introduce the idea of guided rubric answer alignment and provide two concrete architectures based on this, GRAASP and ToLeGRAA. Both are novel transformer-based architectures that use attention mechanisms to predict the alignment between student answers and rubric criteria. We compare them to multiple baselines using ALICE-LP, a novel German short-answer scoring dataset collected from formative assessments in authentic German school contexts, and the widely used ASAP-SAS dataset, collected at American schools. For both datasets, GRAASP and ToLeGRAA achieve highly competitive performance. By explicitly aligning learner responses with rubric criteria when assigning scores, these models demonstrate the feasibility of rubric-based short-answer scoring. This finding underscores the high potential of integrating rubric-driven scoring models into educational assessment. |
| Yeo Jin Kim, Daeun Hong, Xiaotian Zou, Wookhee Min, Snigdha Chaturvedi, Cindy E. Hmelo-Silver and James Lester | Collaborative Dialogue Analysis for Productive Problem Solving | Collaborative problem solving requires students to jointly reason, negotiate, and regulate their learning. Understanding collaborative problem solving through student dialogue can inform timely identification of productive and unproductive collaborative behaviors. In this study, we investigate the use of large language models to automatically classify collaborative problem-solving dialogue segments into two categories: Productive and Unproductive. To support deeper analysis, we additionally explore classification of eight detailed collaborative problem-solving sub-categories. We present an error-augmented few-shot prompting method that incorporates misclassified examples to refine model understanding of classification boundaries. Using dialogue data from a middle school collaborative game-based learning environment, our approach substantially improves classification accuracy over zero-shot baselines. Qualitative analysis of the resulting models further highlights which dialogue types are most frequently misclassified, suggesting design implications for adaptive scaffolding. These findings demonstrate that large language models, when guided with targeted prompting strategies, can effectively recognize productive and unproductive dialouge in collaborative learning. |
| Sila Salta, Deisy Briceno, Louis Longin, Olga Viberg and Oleksandra Poquet | Learner Data in Context: Students Would Share Grades and Text but Less So Logs and Video Data | Ethical and transparent data-sharing practices are fundamental to learning analytics (LA), yet current approaches predominantly rely on standardised, binary opt-in/-out consent forms. They are not suitable for the highly contextual nature of data decisions, limiting students’ agency in making informed choices. This study provides empirical evidence supporting the calls for participatory data governance by investigating students’ normative and practical perceptions of data sharing after a group discussing different LA tools. We conducted an online contrastive vignette experiment with 115 students (25 groups) to examine the effect of group discussion, and the effect of data type (log, text, grades, video) and data receiver (student, teacher, peer) on perceptions of acceptability and intended use of LA tools. Consistent with prior work, group discussion negatively shifted perceived acceptability—but only in some contexts. After discussing, students found sharing log and video data significantly less acceptable, while views on grades or text remained consistent. Data receiver (teacher or peer) had no influence on ratings. Students’ intended use of the presented LA tools was unaffected by either discussion or context. Findings underscore the contextual nature of data sharing and emphasise participatory consent practices, with implications for fostering agentic and ethical data practices in LA. |
| Josh Ring, Max Hinne and Inge Molenaar | Consistent Performance in Adaptive Learning Technologies as a Measure of Self-Regulated Learning | Self-regulated learning (SRL) is the process by which students set goals, monitor progress, and regulate behavior during learning. SRL is an acquired skill; by distinguishing students who struggle with content knowledge from those who struggle with SRL, suitable support can be provided to each. We present a novel behavioural measure of SRL which quantifies how consistently students perform during adaptive learning technology (ALT)-supported practice. Using parameter estimates of an item response theory model, we simulate students’ expected performance and compare it against their actual performance. Some students perform worse than expected; such inconsistent performance indicates they are struggling to engage in SRL. We apply our analysis to trace data from primary school students (n=239) practicing math at three adaptive difficulty levels (easy, medium, hard), yielding ability and consistency scores for each student. We find that consistency is moderately correlated with ability (r=.34, p<.001); that consistency and ability significantly predict national standardised math exam scores (R^2=.54); and that students are significantly more consistent in the easy condition. We present consistency as a novel behavioural measure of SRL which complements established metrics, and can help practitioners and researchers to distinguish students who struggle developing content knowledge from those who struggle with SRL. |
| Haejin Lee, Amos Jeng, Sarah Burns, Meg Bates, Cheryl Moran, Hana Kearfott, Tiffany Reyes-Denis, Joseph Cimpian, George Vythoulkas, Nigel Bosch and Michelle Perry | Teacher Characteristics Shape Engagement and Outcomes in Online Professional Learning Environments | This study investigates how certain teacher characteristics—specifically, math anxiety and confidence in teaching mathematics—and school-context features are associated with teachers’ behavioral engagement patterns in an online teacher professional learning platform. Results indicated that teachers with higher levels of math anxiety, lower levels of confidence, and more school-context features that are indicative of underresourced schools were significantly more likely to remain within a single section of the Mathematics Teachers’ Learning Website (MTLW) than to navigate among multiple sections of the MTLW. This research contributes to empirical evidence on how individual differences and school context contribute to diverse patterns of participation in online professional learning. These results offer insights for designing teacher-specific support strategies. |
| Zhen Xu, Vedant Khatri, Yijun Dai, Xiner Liu, Siyan Li, Xuanming Zhang and Renzhe Yu | Enhancing LLM-Based Data Annotation with Error Decomposition | Large language models offer a scalable alternative to human coding for data annotation tasks, enabling the scale-up of research across data-intensive domains such as learning analytics. While LLMs achieve near-human accuracy on fact-based tasks, their performance on construct-based annotation remains inconsistent. Current evaluation practices that collapse all errors into a single accuracy score obscure important information: whether the poor performance reflects model failure or task-inherent ambiguity, and whether all error patterns should be treated equally. To address this gap, we propose a diagnostic evaluation paradigm grounded in content analysis theory, which includes: (1) a diagnostic taxonomy that categorizes LLM annotation errors along two dimensions: source (model-specific vs. task-inherent) and type (boundary ambiguity vs. conceptual misidentification); (2) a human annotation test to estimate task-inherent ambiguity; and (3) a computational method to decompose LLM annotation errors following our taxonomy. We validate this paradigm on four construct-based educational annotation tasks, demonstrating both conceptual validity and practical utility. Theoretically, our results explain why excessively high alignment is often unrealistic and why single alignment metrics inadequately reflect LLM performance on construct-based tasks. Practically, our paradigm offers a low-cost tool to assess task suitability for LLM annotation and provides insights for prompt engineering and model fine-tuning. |
| Hanke Vermeiren, Pritam Laskar, Maria Bolsinova, Wim Van Den Noortgate, Han van der Maas and Abe Hofman | Clusters to Validate Knowledge Components: Analysis of an Adaptive Clock Reading Game | The modeling of a knowledge domain in a learning environment is a critical step with far-reaching implications, not only for the system design but also for the granularity with which student progress can be tracked. In the [System X] learning environment, games are designed to measure one dimension of a broad knowledge domain (e.g. clock-reading). However, dimensionality is introduced in the games through the practice of specific knowledge components (KC; e.g., reading digital clocks with half hours). The assignment of specific items to these KCs is expert-based rather than data-driven. To investigate the validity of these mappings of items to KCs, this study employed a hierarchical clustering method on person-specific performance at the item level. The results revealed a clear pattern; items associated with the same learning goal tend to cluster together. This supports the notion that learning goals capture meaningful and distinct dimensions within the game. Finally, we illustrate the progress of some players in the game to emphasize the importance of detailed tracking of multiple abilities for effective instruction and timely intervention. |
| Mónica Hernández-Campos, Isabel Hilliger and Francisco José García-Peñalvo | Beyond the tool: A two-case study of faculty engagement with curriculum analytics for actionable insights | This study explores why faculty engage with curriculum analytics (CA) tools for learning outcomes (LOs) assessment and how such engagement generates actionable insights for continuous curriculum improvement. Using a two-case design across Latin American universities, we collected data through interviews, cognitive walkthroughs, and document analysis. Findings reveal that faculty engagement is not solely driven by tool adoption, but by an assessment process grounded in reflection, collaboration, and institutional support. Actionable insights emerged throughout the assessment cycle: during planning, as faculty aligned teaching with LOs; during implementation, through real-time pedagogical adjustments; and post-assessment, via visualizations generated by the CA tool. Faculty interpretation of these visualizations informed evidence-based decisions to improve course design and student learning. By documenting these dynamics across two distinct institutional contexts, the study provides empirical evidence and practical guidance for integrating CA into assessment processes, highlighting the importance of fostering a reflective culture to support sustainable engagement and continuous improvement. More broadly, the findings contribute to the LA field by moving beyond tool functionality to explain the organizational and cultural conditions required for impact, thereby addressing a persistent gap between technical innovation and demonstrated educational change. |
| Fawzia Alamray, Naif R. Aljohani, Ahmed S. Alfakeeh, Nawaf Alhebaishi and Dragan Gašević | Exploring Students’ Cognitive Engagement with Generative Artificial Intelligence | Generative artificial intelligence (GenAI) is increasingly embedded in higher education, yet little is known about how it shapes students’ cognitive engagement in learning tasks. This study applies a learning analytics perspective to investigate types and dynamics of cognition that emerge when students interact with GenAI during a question–answering task. We collected 294 GPT-assisted conversations comprising 618 questions using a purpose-built system. Interactions were coded with Bloom’s taxonomy to capture cognition levels and analyzed with two approaches: Epistemic Network Analysis (ENA), which identifies co-occurrence structures, and Frequency-based Transition Network Analysis (FTNA), which models sequential flow. Analyses showed that AI-generated prompts elicited higher cognitive processes, particularly Analyze with a spillover to Apply, while student-authored prompts clustered around Understand and Evaluate. FTNA revealed Remember as a frequent entry point and Analyze as the main convergence node; Although rare, Create occupied a central position when appeared. Students often cycled across Bloom’s categories rather than progressing linearly. This work demonstrates how combining Bloom’s taxonomy with ENA and FTNA captures structural and temporal dynamics of cognition in GenAI-supported environments. Pedagogically, findings highlight how learning analytics can guide prompts design to foster deeper engagement, offering actionable insights for empowering students and guiding educators in higher education. |
| Marek Hatala, Reyhaneh Ahmadi Nokabadi and Nilay Yalcin | Individual vs. Class-Performance Comparison in LADs]{Individual vs. Class-Performance Comparison: Impact of Frame of Reference on Students' Outcome Emotions of Pride, Disappointment, Relief, and Shame | Learning Analytics Dashboards (LADs) make students aware of their progress and outcomes with the goal of inducing reflection and increasing or maintaining motivation. Viewing an outcome-presenting LAD is expected to trigger the causal search process which determines the student's cognitive, emotional and motivational response. Ideally, the information presented in LAD and how it is framed results in a positive emotional and motivational state; however, the knowledge of how to design LADs with desirable impacts is scarce. In a field randomized study with 149 participants, we showed how students' achievement emotions were impacted immediately after viewing a widely deployed LAD in major LMS and a LAD prototype, both of which showed students' assignment outcomes. The students' responses varied between LAD conditions according to students' achievement goal orientation and grade received. This study contributes to the knowledge of how LAD elements and their framing impact students' achievement emotions. |
| Zack Carpenter, Yeyu Wang, David DeLiema, Panayiota Kendeou, Matthew Bernacki and David Shaffer | Expanding Design Heuristics for Supporting Impasse-Driven Learning in a Puzzle Video Game Using a Problem-Solving Framework and Multimodal Network Models | Learning analytics has a strong tradition of advancing the understanding of learning in video games. Recent work highlights how students encounter impasses and what these moments reveal about learning and game design, yet less attention has been given to offering nuanced accounts of learning processes while also using analytics to inform the design of impasses themselves. Because impasses present an opportunity to learn by connecting stuck points to underlying concepts, designing them well becomes especially important. In this paper, we examine how students learn a central mechanic in the puzzle game Baba Is You using a theory-driven problem-solving framework and multimodal network models. Our results indicate that students who show evidence of learning the mechanics more often engage in behaviors linking an impasse to its underlying cause, while those who do not tend to focus on the impasse itself. From these findings, we derive six heuristics for impasse design. This study contributes to learning analytics and game research by detailing, through talk, gaze, and action, how players learn a mechanic by experiencing impasses and by offering empirically grounded heuristics for data-driven impasse design. |
| Luwen Huang, Inderjit Dhillon and Karen Willcox | Modeling Longitudinal Student Pathways with Explainable Generative Models | Student pathways - trajectories spanning academic readiness, course selections, grades, and enrollment outcomes - are critical for understanding educational progress, particularly in community college systems where pathways are highly diverse. Restricted access to student records and a sparsity of coherent pathway data pose barriers to modeling and broader research engagement. This paper contributes a scalable and explainable framework for generating synthetic student pathway data, along with a Bayesian network structure learning approach adapted to the sparsity and temporal complexity of educational trajectories. We present a generative modeling approach that produces realistic synthetic student pathway data by learning a Bayesian network trained on longitudinal student records across multiple tables linked across time. Our model captures complex conditional dependencies across hundreds of variables while remaining interpretable: each parameter encodes transparent relationships that can be inspected or adjusted. We show that our method outperforms an independent sampler in reproducing marginal, conditional, and higher-order patterns in the real data. Our analysis shows that existing k-anonymity rules are infeasible for real or synthetic data, motivating a shift toward model-aware approaches for privacy considerations in student pathway data. |
| Jeroen Ooge, Anissa Faik and Katrien Verbert | Detect, Explain, Act: How Teachers Trust and Use an Explainable Real-Time Monitoring Dashboard to Detect Student Outliers in Class | Adaptive learning platforms are increasingly being used in educational settings to personalise learning, including in classrooms. To ensure teachers can maintain an overview of student progress and swiftly identify students who require additional attention, such platforms can be extended with monitoring dashboards that detect outlier students. However, a lack of explainability in such student outlier detection can lead to mistrust and inappropriate use. Therefore, we designed and implemented a real-time monitoring dashboard with textual model-centric 'how' explanations and visual data-centric 'why' explanations as part of an adaptive learning platform. We then conducted a counterbalanced within-group experiment with follow-up interviews to study how 11 teachers used the system in a real class setting and how they interacted with the explanations to calibrate their trust. We found that teachers could effectively integrate the dashboard into their classes in various ways, and that their trust was influenced by many dispositional, situational, and learned factors. Crucially, the data-centric explanations helped teachers verify the accuracy of predicted outliers, check alignment with their knowledge of students, and identify potential interventions. We use these findings to discuss design implications for explainable outlier detection systems in education. |
| Shan Zhang, Ruiwei Xiao, Anthony F. Botelho, Guanze Liao, Thomas K. F. Chiu, John Stamper and Ken Koedinger | How to Assess AI Literacy: Misalignment Between Self-Reported and Objective-Based Measures | The widespread adoption of Artificial Intelligence (AI) in K–12 education highlights the need for psychometrically-tested measures of teachers' AI literacy. Existing work has primarily relied on either self-report (SR) or objective-based (OB) assessments, with few studies aligning the two within a shared framework to compare perceived versus demonstrated competencies or examine how prior AI literacy experience shapes this relationship. This gap limits the scalability of learning analytics and the development of learner profile–driven instructional design. In this study, we developed and evaluated SR and OB measures of teacher AI literacy within the established framework of Concept, Use, Evaluate, and Ethics. Confirmatory factor analyses support construct validity with good reliability and acceptable fit. Results reveal a low correlation between SR and OB factors. Latent profile analysis identified six distinct profiles, including overestimation (SR > OB), underestimation (SR < OB), alignment (SR ≈ OB), and a unique low-SR/low-OB profile among teachers without AI literacy experience. Theoretically, this work extends existing AI literacy frameworks by validating SR and OB measures on shared dimensions. Practically, the instruments function as diagnostic tools for professional development, supporting AI-informed decisions (e.g., growth monitoring, needs profiling) and enabling scalable learning analytics interventions tailored to teacher subgroups. |
| Yann Hicke, Jadon Geathers, Kellen Vu, Justin Sewell, Claire Cardie, Jaideep Talwalkar, Dennis Shung, Anyanate Gwendolyne Jack, Susannah Cornes, Mackenzi Preston and Rene Kizilcec | MedSimAI: Simulation and Formative Feedback Generation to Enhance Deliberate Practice in Medical Education | Medical education faces challenges in providing scalable, consistent clinical skills training. Simulation with standardized patients (SPs) develops communication and diagnostic skills, but remains resource-intensive and variable in feedback quality. Existing AI-based tools show promise yet often lack comprehensive assessment frameworks, evidence of clinical impact, and integration of self-regulated learning (SRL) principles. Through a multi-phase co-design process with medical education experts, we developed \emph{MedSimAI}, an AI-powered simulation platform that enables deliberate practice through interactive patient encounters with immediate, structured feedback. Leveraging large language models, \emph{MedSimAI} generates realistic clinical interactions and provides automated assessments aligned with validated evaluation frameworks. In a multi-institutional deployment (410 students; 1{,}024 encounters across three medical schools), 59.5\% engaged in repeated practice. At one site, mean OSCE history-taking scores rose from 82.8 to 88.8 ($p<0.001$, $d=0.75$), while a second site’s pilot showed no significant change. Automated scoring achieved 87\% accuracy in identifying proficiency thresholds on the Master Interview Rating Scale (MIRS). Mixed-effects analyses revealed institution and case effects. Thematic analysis of 840 learner reflections highlighted challenges in missed items, organization, review-of-systems, and empathy. These findings position \emph{MedSimAI} as a scalable formative platform for history-taking and communication, motivating staged curriculum integration and realism enhancements for advanced learners. |
| Shan Zhang, Siddhartha Pradhan, Ji-Eun Lee, Ashish Gurung and Anthony F. Botelho | Let Me Try Again: Examining Replay Behavior by Tracing Students' Latent Problem-Solving Pathways | Prior research has shown that students' problem-solving pathways in game-based learning environments reflect their conceptual understanding, procedural knowledge, and flexibility. Replay behaviors, in particular, may indicate productive struggle or broader exploration, which in turn foster deeper learning. However, little is known about how these pathways unfold sequentially across problems or how the timing of replays and other problem-solving strategies relates to proximal and distal learning outcomes. This study addresses these gaps using Markov Chains and Hidden Markov Models (HMMs) on log data from 777 seventh graders playing the game-based learning platform of From Here to There!. Results show that within problem sequences, students often persisted in states or engaged in immediate replay after successful completions, while across problems, strong self-transitions indicated stable strategic pathways. Four latent states emerged from HMMs: Incomplete-dominant, Optimal-ending, Replay, and Mixed. Regression analyses revealed that engagement in replay-dominant and optimal-ending states predicted higher conceptual knowledge, flexibility, and performance compared with the Incomplete-dominant state. Immediate replay consistently supported learning outcomes, whereas delayed replay was weakly or negatively associated in relation to Non-Replay. These findings suggest that replay in digital learning is not uniformly beneficial but depends on timing, with immediate replay supporting flexibility and more productive exploration. |
| Daniel Ritchie, Nia Nixon, Tamara Tate and Mark Warschauer | From Wandering to Collaboration: Discourse Patterns in Middle School Generative AI Use | Generative artificial intelligence (AI) has become a prominent presence in classrooms, yet relatively little is known about how students actually engage with such tools in authentic school contexts. This study examines more than 17,000 messages across 1,512 conversations from 484 middle school students using a classroom-based generative AI writing tutor. We extracted linguistic, cognitive, and interactional features, reduced dimensionality with principal component analysis, and applied clustering to identify conversation- and student-level patterns of engagement. Results revealed five conversation profiles—ranging from directive to reflective dialogue—and four student profiles, including collaborators, transactional tool users, independent thinkers, and chatters. These patterns aligned with both pedagogical intent and individual orientation, underscoring that student–AI dialogue is heterogeneous but systematic. Findings contribute empirical evidence to debates about generative AI in education and provide a methodological framework for analyzing human–AI interaction in learning analytics. |
| Bakhtawar Ahtisham, Kirk Vanacore, Jinsook Lee, Zhuqian Zhou, Doug Pietrzak and Rene F. Kizilcec | AI Annotation Orchestration: Evaluating LLM verifiers to Improve the Quality of LLM Annotations in Learning Analytics | Large Language Models (LLMs) are increasingly used to annotate learning interactions, yet concerns about reliability limit their utility. We test whether verification-oriented orchestration—prompting models to check their own labels (self-verification) or audit one another (cross-verification)—improves qualitative coding of tutoring discourse. Using transcripts from 30 one-to-one math sessions, we compare three production LLMs (GPT, Claude, Gemini) under three conditions: unverified annotation, self-verification, and cross-verification across all orchestration configurations. Outputs are benchmarked against a blinded, disagreement-focused human adjudication using Cohen’s kappa. Overall, orchestration yields a 58% improvement in kappa. Self-verification nearly doubles agreement relative to unverified baselines, with the largest gains for challenging tutor moves. Cross-verification achieves a 37% improvement on average, with pair- and construct-dependent effects: some verifier-annotator pairs exceed self-verification, while others reduce alignment, reflecting differences in verifier strictness. We contribute: (1) a flexible orchestration framework instantiating control, self-, and cross-verification; (2) an empirical comparison across frontier LLMs on authentic tutoring data with blinded human “gold” labels; and (3) a concise notation, verifier(annotator) (e.g., Gemini(GPT) or Claude(Claude)), to standardize reporting and make directional effects explicit for replication. Results position verification as a principled design lever for reliable, scalable LLM-assisted annotation in Learning Analytics. |
| Abisha Thapa Magar, Asad Uzzaman, Tali Zacks, Stephen Fancsali, Vasile Rus, April Murphy, Ethan Moltz, Steve Ritter and Deepak Venugopal | Understanding and Modeling Math Strategy Use in Intelligent Tutoring Systems | We investigate how students learn to apply context-specific math strategies by analyzing data from MATHia (a widely used Intelligent Tutoring System) collected from a large set of schools. In particular, we focus on a set of lessons designed to teach ratios and proportions where students learn multiple strategies individually, and then are presented with lessons in which they are presented with options to make a choice between strategies. Our results demonstrate that a majority of students may not learn conditional reasoning to select optimal strategies. To understand this more deeply, we use knowledge tracing models and also explain and interpret neural representations of strategies learned using BERT from step-level interactions between students and the ITS. Finally, we study the effectiveness of MATHia's adaptive supports that attempt to guide students to the optimal strategy, and compare insights from our data to those produced by state-of-the-art generative AI models. Our results demonstrate that LLMs may produce results that seem to reflect ideal, expected outcomes in strategy learning, but the generation may not accurately reflect the complexities of real student learning. |
| Jessica Vitale, Alyssa Van Camp, Sarah Miller, Mercedes Ortiz and Sidney D'Mello | Scaling Teacher Professional Learning Through Automated Feedback: A Large-Scale Evaluation in K-12 Classrooms | Automated feedback provides a scalable approach to supporting teacher reflection and teaching practice. However, there is limited evidence on how these technologies impact multiple dimensions of teacher and student outcomes in authentic school settings. This mixed-methods study evaluated a 4-5 week pilot of providing professional learning and automated feedback to K-12 teachers. We examined its impact on teacher well-being, teaching practice, classroom talk, and student engagement. Findings indicate that automated feedback is associated with reductions in teacher burnout, improvements in self-efficacy, and increased perceptions of student engagement. Impacts on teaching practices were mixed: teachers increased their use of specific praise and wait time but decreased their use of open-ended questions and pressing students for reasoning. These findings suggest the potential of automated feedback to support teacher well-being and inform instructional changes, while highlighting areas for further research on sustaining and broadening practice improvements. |
| Zhiqing Zheng, Marek Hatala and Arash Ahrar | An Immediate And Retrospective Analysis Of Stability And Change: Examining Students' Emotion And Motivation Profiles In Response To Outcome-Presenting Learning Analytics Dashboards | Learning Analytics Dashboards (LADs) have been studied to support student learning and motivation, yet their long-term impact is still in question. This study examines students' immediate and retrospective emotional and motivational responses to two types of outcome-presenting dashboards: a class-performance comparison dashboard (CP-D) and an individual-comparison dashboard (IC-D). Using a mixed-methods approach with survey data, clustering of Achievement Goal Orientations (AGOs), and qualitative analysis of student reflections, the study explores how emotions persisted or diminished across the semester. Results showed that emotions more strongly associated with perceived control (e.g., pride, hope) tended to persist, whereas those less tied to control (e.g., assurance, hopelessness) were less stable. Anxiety was more persistent among CP-D users, whereas disappointment diminished most among Mastery-oriented students across conditions. Interestingly, IC-D supported hope even among some lower-performing Non-Competitive students, while CP-D anchored students more strongly to the class average. These findings underscore that dashboard design interacts with performance and motivational profiles, suggesting the need for adaptive and hybrid designs that balance comparative feedback with individualized progress indicators. |
| Yumou Wei, John Stamper and Paulo Carvalho | Generate-Then-Validate: A Novel Question Generation Approach Using Small Language Models | We explore the use of small language models (SLMs) for automatic question generation as a complement to the prevalent use of their large counterparts in learning analytics research. We present a novel question generation pipeline that leverages both the text generation and the probabilistic reasoning abilities of SLMs to generate high-quality questions. Adopting a ``generate-then-validate'' strategy, our pipeline first performs expansive generation to create an abundance of candidate questions and refine them through selective validation based on novel probabilistic reasoning. We conducted two evaluation studies, one with seven human experts and the other with a large language model (LLM), to assess the quality of the generated questions. Most judges (humans or LLMs) agreed that the generated questions had clear answers and generally aligned well with the intended learning objectives. Our findings suggest that an SLM can effectively generate high-quality questions when guided by a well-designed pipeline that leverages its strengths. |
| Erwin Daniel López Zapata and Atsushi Shimada | Is Log-Traced Engagement Enough? Extending Reading Analytics With Trait-Level Flow and Reading Strategy Metrics | Student engagement is a central construct in Learning Analytics, yet it is often operationalized through persistence indicators derived from logs, overlooking affective–cognitive states. Focusing on the analysis of reading logs, this study examines how trait-level flow—operationalized as the tendency to experience Deep Effortless Concentration (DEC)—and traces of reading strategies derived from e-book interaction data can extend traditional engagement indicators in explaining learning outcomes. We collected data from 100 students across two engineering courses, combining questionnaire measures of DEC with fine-grained reading logs. Correlation and regression analyses show that (1) DEC and traces of reading strategies explain substantial additional variance in grades beyond log-traced engagement (ΔR² = 21.3% over the baseline 25.5%), and (2) DEC moderates the relationship between reading behaviors and outcomes, indicating trait-sensitive differences in how log-derived indicators translate into performance. These findings suggest that, to support more equitable and personalized interventions, the analysis of reading logs should move beyond a one-size-fits-all interpretation and integrate personal traits with metrics that include behavioral and strategic measures of reading. |
| Fatma Betül Güres, Tanya Nazaretsky, Seyed Parsa Neshaei and Tanja Käser | Structuring versus Problematizing: How LLM-based Agents Scaffold Learning in Diagnostic Reasoning | Large language models (LLMs) are increasingly embedded in educational technologies, offering new opportunities to scaffold complex reasoning through dialogue. Yet the effectiveness of different scaffolding approaches remains unclear, particularly when learners must apply diagnostic strategies in scenarios simulating real-world open contexts. This study examines how two contrasting types of scaffolding, namely Structuring-focused and Problematizing-focused, when delivered by LLM-powered mentors, shape learning in PharmaSim Switch, a novel scenario-based simulation for pharmacy technician apprentices. 63 students engaged with PharmaSim Switch across three diagnostic scenarios designed to explore how they learn diagnostic strategies and analyze differences in their learning behavior. Results show that while both scaffolds improved strategic engagement, the Structuring-focused support provided an early performance advantage and fostered systematic uptake of taught strategies, whereas the Problematizing-focused support stimulated more constructive elaboration and reflection. Contrary to expectations, prior knowledge did not significantly affect the results. These findings highlight the benefits of introducing structuring and problematizing scaffolding into scenario-based learning environments and underscore the potential of LLM-powered agents to shape how learners reason and engage in professional training settings. |
| Rully Agus Hendrawan, Rafaella Sampaio de Alencar, Alice Micheli, Peter Brusilovsky, Jordan Barria-Pineda and Sergey Sosnovsky | Data-Driven Evaluation of LLM-Based Ontology Concept Extraction from Programming Learning Content | The process of associating elements of learning content with concepts or skills that this content helps students to master is one of the critical steps in developing personalized educational systems. When these associations are properly established, the system can infer the growth of student understanding of separate knowledge components from their interactions with associated learning content and use it to adapt learning process accordingly by targeting gaps in individual students' knowledge. Unfortunately, crafting these links between learning content and knowledge components is a very time- and expertise-demanding process that has traditionally been performed manually by domain experts with the help of knowledge engineers. Recently, the power of Large Language Models (LLMs) has motivated a new generation of research on concept extraction from textual learning content. The work presented in this paper contributes to this trend while introducing two important innovations. First, our concept extraction process is guided by a human-authored ontology of the target domain - Python programming. Second, alongside a traditional expert evaluation of the concept extraction quality, we apply two additional validation approaches: one based on using an educational data mining technique (learning curves) and another utilizing the pedagogical expertise of teaching the target domain (learning content placement). |
| Zheng Fang and Zachari Swiecki | Combining LLMs and Computational Simulations to Generate Collaborative Discourse | Collaborative problem solving is critical to domains such as education, healthcare, and defence. Learning analytics based techniques for understanding collaborative problem solving rely on observed data. Such data is costly to collect and analyse and leads to results that have limited generalisability. Computational simulations offer one way to address these limitations; however, their results often lack interpretability. Large language model-based simulations offer interpretability but lack stability. We propose and evaluate an approach to combining computational and large language-based simulations that is able to generate controlled and realistic collaborative discourse for use in subsequent learning analytics studies and systems. |
| Joyce Horn Fonteles, Nithin Sivakumaran, Clayton Cohn, Austin Coursey, Shoubin Yu, Elias Stengel-Eskin, Ashwin T. S., Mohit Bansal and Gautam Biswas | A Novel Approach to Evaluating the Effectiveness of Large Language Models for Multimodal Analysis of Embodied Learning in Classrooms | This paper presents an approach using Large Language Models (LLMs) as late-fusion interpreters to synthesize multimodal signals from embodied activities in classrooms and infer students’ metacognitive behaviors. Our multimodal pipeline analyzes students' movements, gaze, gestures, and speech within a mixed-reality simulation displayed on a classroom screen to support enactment and learning. Vision- and speech-derived features are fused at the interpretive layer via zero-shot prompting, self-consistency reasoning, and targeted prompt engineering to derive planning, enacting, monitoring, reflecting, and interacting behaviors. We investigate whether LLMs can reliably integrate modality-specific analytics to produce accurate behavioral labels. To address scalability and reduce human burden, we introduce an automated evaluation protocol employing LLM-as-a-Judge to assess classification quality, enabling rapid, iterative benchmarking of model variants and prompt strategies. Using a balanced corpus of human-validated segments and perturbed controls, we compare text-only language models (e.g., GPT-5) against visual–language models (e.g., Qwen2.5-VL) incorporating direct visual processing. Results indicate late-fusion, text-based LLMs can outperform VLMs on behavior labeling without raw video, and precision- or recall-oriented prompts adjust decision boundaries for subtle or brief segments. These findings position LLMs as effective late-fusion mechanisms for multimodal learning analytics and demonstrate the viability of LLM-as-a-Judge for scalable, human-in-the-loop evaluation. |
| Kyuwon Kim, Jeanhee Lee, Sung-Eun Kim and Hyo-Jeong So | Productive Discussion Moves in Groups Addressing Controversial Issues | As society grows increasingly diverse, learners need to critically engage with controversial issues that involve conflicting values and perspectives. Dialogue has been taken as a powerful means of fostering such engagement, yet frameworks capturing productive dialogue in ethical and value-laden contexts remain underdeveloped. This study investigates productive discussion on AI ethics dilemmas by addressing three research questions: (1) What types of discussion moves emerge when addressing controversial issues? (2) Which moves are associated with higher discussion quality? (3) How do sequential patterns of discussion differ between groups? We conducted small-group discussions with 51 undergraduates on AI ethics scenarios and analyzed 2,199 utterances using a hybrid learning analytics approach that combined manual coding with a deep-learning approach. Our bottom-up analysis identified 14 distinct discussion moves across five categories, including reasoning, elaboration, emotional expression, and discussion management. Regression analyses revealed that moves such as Acknowledging-Ambiguity, Emotive/Experiential-Argument were positively associated with discussion quality, while Building-on-Ideas was negatively associated. Sequential Analysis further demonstrated that high-quality groups engaged in constructive inquiry cycles, whereas low-quality groups displayed defensive justification patterns. Our findings highlight that ethical discussions extend beyond logical reasoning to include affective moves, underscoring the need for analytics to support dialogue in moral discussions. |
| Zhenwan Zhu, Rui Zhong and Longxiang Du | A Synergistic Framework for Cognitive Diagnosis with LLM-Empowered Data Augmentation | Accurate cognitive diagnosis is crucial for enabling personalized learning. However, its effectiveness is severely hampered by data sparsity and class imbalance in student-exercise interactions. To address these challenges, this paper proposes a synergistic framework that integrates Artificial Intelligence and Learning Analytics. First, to mitigate data sparsity, we leverage a Large Language Model (LLM) with chain-of-thought prompting to enrich the Q-matrix, thereby expanding knowledge concept coverage. Second, to address class imbalance, we introduce a Generative Adversarial Network (GAN) constrained by this refined knowledge structure to synthesize pedagogically meaningful negative samples. Finally, a dual-level graph attention network is employed to infer precise knowledge states from both the LLM-enriched Q-matrix and the balanced response data. Extensive experiments on the Junyi dataset demonstrate that our framework significantly outperforms strong baselines, achieving a 6.21% improvement in AUC and increasing specificity from 54.24% to 85.83%. These results underscore its potential for delivering reliable and actionable learning diagnostics. |
| Mei Tan, Lena Phalen and Dorottya Demszky | Marked Pedagogies: Examining Linguistic Biases in Personalized Automated Writing Feedback | Effective personalized feedback is critical to students' literacy development. Though LLM-powered tools now promise to automate such feedback at scale, LLMs are not language-neutral: they privilege standard academic English and reproduce social stereotypes, raising concerns about how “personalization” shapes the feedback students receive. We examine how four widely used LLMs (GPT-4, GPT-3.5, Llama-3 70B, Llama-3 8B) adapt written feedback in response to student attributes. Using 600 eighth-grade persuasive essays from the PERSUADE dataset, we generated feedback under prompt conditions embedding gender, race/ethnicity, learning needs, achievement, and motivation. We analyze lexical shifts across model outputs by adapting the Marked Words framework. Our results reveal systematic, stereotype-aligned shifts in feedback conditioned on presumed student attributes—even when essay content was identical. Feedback for students marked by race, language, or disability often exhibited positive feedback bias and feedback withholding bias—overuse of praise, less substantive critique, and assumptions of limited ability. Across attributes, models tailored not only what content was emphasized but also how writing was judged and how students were addressed. We term these instructional orientations Marked Pedagogies and highlight the need for transparency and accountability in automated feedback tools. |
| Jieyi Li, Laura Graf, Andrew Zamecnik, Arslan Azad, Srecko Joksimovic and Oleksandra Poquet | How Do Students Listen to Each Other When Solving Complex Problems? | Student ability to succeed in collaborative problem solving (CPS) is increasingly important. However, identifying aspects of CPS that individuals can act on to improve it remains a challenge. This study explores listening behaviour as a lever for effective CPS, given that listening is at once cognitive and social, and that individuals can enact it. Despite its importance in communication, listening is rarely systematically examined in CPS. To address this gap, we propose a framework for identifying listening behaviours in the text of student exchanges, and apply it to analyse patterns of listening behaviours of 34 K-12 students, working in nine groups on CPS activities. By combining content analysis, k-means cluster analysis, and correlation-based coupling network analysis, we identify ten distinct patterns of listening behaviours and their coupling over time, across groups with different success outcomes. We found that listening patterns varied by performance. Higher performing groups engaged in more counterarguments and constructive listening. Groups with lower CPS successes exhibited two ineffective patterns: questioning with instrumental listening, and counterarguments and questioning without encouraging listening. These findings pose questions about the relationship between listening and learning processes and have implications for multimodal research in learning analytics. |
| Vimukthini Jayalath, Abhinava Barthakur, Shane Dawson, Vitomir Kovanovic and Ryan Baker | Capturing Professional Skill Development: A Curriculum Analytics Approach | Higher education faces increasing pressure from governments and employers to ensure graduates are equipped with the knowledge and capabilities required for the future workplace. While technical knowledge is assessed through grades, professional skills are complex and remain difficult to evaluate systematically. While curriculum mapping offers a potential solution, it is often applied in a simplistic, accreditation-driven manner that merely records the presence or absence of skills embedded in assessments. Such an approach overlooks the relative contribution of each skill to the assessments and, hence, cannot be used to estimate skill development. Other noted approaches have relied on the use of self-assessment reports or surveys, and as such are subjective and cannot provide longitudinal evidence of skill development. This study addresses these limitations by proposing a novel curriculum analytics method, weighted Performance Factor Analysis, to model skill scores using assessment grades and granular weighted skill-assessment mapping. Students’ development of seven professional skills were examined along with how they transition across an accounting degree program. The findings show distinct patterns and trajectories of skill development. Overall, the study makes a significant methodological contribution to measuring skills and offers insights into how graduates develop their professional skills across the curriculum. |
| Daniela Rotelli, Yves Noël, Sébastien Lallé and Vanda Luengo | Automated Enrichment of Course Structure into Moodle logs: The Contribution of Context-Aware Structural Information to Advance Learning Analytics | This article presents a methodology for advancing Learning Analytics by automatically reconstructing the organisational structure of Moodle courses leveraging platform logs and backup data. Using an extensive, pseudonymised dataset gathered from twelve undergraduate courses, we illustrate how student activity records can be enriched with structural information. This methodology integrates xAPI statements with parsed XML data from Moodle course backups to create directed graph representations that capture both hierarchical and sequential relationships among sections and activities. By introducing graph-based structural modelling into Learning Analytics practice, we provide new tools to facilitate replicable and context-aware analytics, enabling deeper insights into the relationship between course structure and learning behaviours, as well as actionable enhancement of digital course design. |
| Toru Sasaki, Rianne Conijn and Martijn C. Willemsen | The Blind Spots in Automated Feedback Generation for Academic Writing | Machine learning-based automated essay scoring (AES) and feedback generation (AFG) tools have been developed since the 1960s, with some having been commercially deployed. Such systems are expected to lighten the labor-intensive task of essay scoring and help provide timely feedback to students. However, these tools have not been used effectively enough in practice despite the long history of this research field. We aim to determine how accurately currently available models can make the necessary corrections and detect improvable segments of text. Latest attempts of AFG utilize state-of-the-art generative large language models (LLMs) for writing evaluation. To the best of our knowledge, however, none of the past studies have included generative LLM-based models in the comparison with human feedback or among AI tools. To fill these gaps, we conduct an experimental comparison of human feedback and three AI tools developed at different stages of technological advancement, with a clear distinction between correction and improvement. Findings indicate that the overlap between human and AI feedback is predominantly limited to surface-level linguistic features and that generative AI-augmented tools demonstrate a markedly higher capability than a tool based on conventional rule-based AI. |
| Jin Eun Yoo, Nam-Hwa Kang, Suna Ryu, Jun-ki Lee, Youngsun Kwak, Taeuk Kim, Hyeong Gwan Kim, Youngwoo Shin and Uiji Hwang | Development and Evaluation of a Dual-Expertise, Utterance-Level Framework for LLM-Based Science Classroom Discourse Analysis | This study presents a novel coding framework for classroom discourse analysis tailored to large language models (LLMs). To overcome the limitations of traditional global-level observation tools, the framework adopts utterance-level segmentation and fine-grained coding aligned with LLM analytical units. Authentic classroom discourse from secondary science teachers was analyzed through a dual-expertise approach integrating the academic knowledge of science education faculty and the experiential insights of in-service teachers. A 137-term science education glossary, organized into 20 thematic categories covering all instructional dimensions, was developed to ensure terminological precision and consistency. Among various prompting and fine-tuning configurations, fine-tuned encoder models with KL-divergence achieved the highest performance in both rating and theme prediction, outperforming all prompting-based methods. Interpretation of these results requires consideration of class imbalance, motivating future research on data expansion, adaptive fine-tuning, and integrated prompting strategies. The proposed framework establishes a foundation for domain-adaptive and pedagogically grounded LLM research, paving the way for automatic feedback systems that support teachers’ self-reflection and professional growth. |
| Chloe Qianhui Zhao, Jie Cao, Jionghao Lin and Kenneth R. Koedinger | LLM-based Multimodal Feedback Produces Equivalent Learning and Better Student Perceptions than Educator-written Feedback | Providing timely, targeted, and multimodal feedback helps students quickly correct errors, build deep understanding and stay motivated, yet making it at scale remains a challenge. This study introduces a real-time AI-facilitated multimodal feedback system that integrates structured textual explanations with dynamically multimedia resources, including retrieved most relevant slide page references and streaming AI audio narration. In an online learnersourcing experiment, we compared this system against fixed educator-written text feedback across three dimensions: (1) learning outcomes, (2) learner engagement, (3) perceived feedback quality and value. Results showed that AI multimodal feedback achieved learning gains equivalent to human feedback while significantly outperforming it on perceived clarity, specificity, conciseness, motivation, satisfaction, and reducing cognitive load, with comparable correctness, trust, and acceptance. Process logs revealed distinct engagement patterns: for multiple-choice questions, educator feedback encouraged more submissions; for open-ended questions, AI-facilitated targeted suggestions lowered revision barriers and promoted iterative improvement. These findings highlight the potential of AI multimodal feedback to provide scalable, real-time, and context-aware support that both reduces instructor workload and enhances student experience. |
| Thierry Geoffre and Trystan Geoffre | Modeling Grammatical Hypothesis Testing in Young Learners: A Sequence-Based Learning Analytics Study of Morphosyntactic Reasoning in an Interactive Game | This study investigates grammatical reasoning in primary school learners through a sequence-based learning analytics approach, leveraging fine-grained action sequences from an interactive game targeting morphosyntactic agreement in French. Unlike traditional assessments that rely on final answers, we treat each slider movement as a hypothesis-testing action, capturing real-time cognitive strategies during sentence construction. Analyzing 597 gameplay sessions (9,783 actions) from 100 students aged 8–11 in authentic classroom settings, we introduce Hamming distance to quantify proximity to valid grammatical solutions and examine convergence patterns across exercises with varying levels of difficulty. Results reveal that determiners and verbs are key sites of difficulty, with action sequences deviating from left-to-right usual treatment (corresponding to writing convention). This suggests learners often fix the verb first and adjust preceding elements. Exercises with fewer solutions exhibit slower and more erratic convergence, while changes in the closest valid solution indicate dynamic hypothesis revision. Our findings demonstrate how sequence-based analytics can uncover hidden dimensions of linguistic reasoning, offering a foundation for real-time scaffolding and teacher-facing tools in linguistically diverse classrooms. This work contributes to the emerging intersection of literacy development and process mining in learning analytics. |
| Fiorenzo Colarusso, Peter Cheng, Ronald Grau, Grecia Garcia Garcia, Daniel Raggi and Mateja Jamnik | Can Micro-Behaviours be used for Graph Comprehension Assessment? | Transcription with Incremental Presentation of the Stimulus (TIPS) is an instrument to assess users’ graph comprehension through trace data. TIPS records cognitive micro-behaviours (e.g., keystrokes and pen-strokes) with millisecond accuracy across cycles of viewing and copying. These signals provide nine potential TIPS measures to study learners competence. In an experiment with 30 participants and visualizations of varying complexity, several TIPS measures, particularly when normalized using a test of rigid spatial transformations (Mental Rotation Test), showed moderate to strong correlations (r = 0.5–0.6) with an independent measure of graph comprehension. Results were most robust for complex stimuli. Potential practical benefits of TIPS include shorter testing time, no need to create new questions for different stimuli, and the ability to fully automate scoring, making the assessment more efficient and scalable. |
| Qiao Jin, Conrad Borchers, Ashish Gurung, Sean Jackson, Sameeksha Agarwal, Jocelyn Wang, Andy Yu, Pragati Maheshwary and Vincent Aleven | Sticky Help, Fleeting Effects: Session-by-Session Analytics of Teacher Interventions in K-12 Classrooms | Teachers’ in-the-moment support is a critical yet limited resource in technology-supported classrooms. However, less is known about how prior intervention history shapes teachers’ decisions and whether help translates into sustained student benefits. We present a multi-method investigation of teacher interventions in K-12 mathematics classrooms using the ITS. First, interviews with nine teachers identified help history and student state (e.g., struggle, idle) as central decision factors regarding who to help. Building on this insight, we analyzed 1.4M student-system interactions across 14 classes, linking teacher-logged help events with fine-grained engagement states. Mixed-effects models showed that students with a help history were significantly more likely to receive additional support, even after adjusting for current engagement. Cross-lagged panel analyses further revealed that teacher help persisted across sessions, whereas idle behavior did not elicit sustained attention. Finally, additive factors modeling and cross-lagged analyses indicated that help coincided with immediate learning but did not predict skill acquisition in later sessions. These findings suggest that teacher help is “sticky” in that it recurs for previously supported students, but its effects on engagement and learning are largely short-lived. We discuss implications for the design of real-time analytics to better support equitable and effective teacher allocation of attention. |
| Duc-Hai Nguyen, Kinza Salim, David Power, Murray Connolly, Ahmed Hamdy, Maya Contreras, Tai Tan Mai, George Shorten, Barry O'Sullivan, Vijayakumar Nanjappan and Harry Nguyen | Empowering Multimodal Learning Analytics using Agentic AI: A Comprehensive Platform for Simulation-based Clinical Training with Intelligent Assessment | Simulation-based clinical training generates rich multimodal traces (video, audio, interaction logs) that remain underused because of fragmented data streams, labor-intensive annotation, weak provenance linking evidence to claims, and tools misaligned with educator workflows. We introduce ClinVision, an educator-in-the-loop platform operationalizing end-to-end multimodal learning analytics: synchronized multi-camera review with time-linked annotations, ISBAR-aligned scoring with controlled vocabularies, and templated reports with jump-to-evidence provenance. Agentic AI assists phrasing and labeling under explicit human control (accept/edit/reject) with visible provenance, supporting accountable use rather than prescriptive automation. In an in-learning deployment with five educators, we gained several insights by triangulating usage logs, surveys, and focus-group analysis, we found full ISBAR coverage and time-efficient workflows, but low AI acceptance and anticipated inter-rater variance without behavioral anchors. ClinVision offers three MMLA insights: (i) preliminary evidence suggests end-to-end workflow integration, not individual analytics, may drive adoption in high-stakes settings; (ii) design tensions (assistance vs. authority, structure vs. flexibility, evidence vs. overload) transferable beyond clinical simulation; and (iii) patterns for trustworthy agentic AI preserving educator agency, where background assistance appears more acceptable than real-time prescription. |
Short Research Papers
| Authors | Title | Abstract |
| Naoyuki Kita, Jill-Jênn Vie, Koh Takeuchi and Hisashi Kashima | Robust Post-hoc Score Allocation in Exams | Examinations evaluate students' abilities through a series of questions, each assigned a score. The cumulative grade is intended to reflect a student's academic ability, and appropriate score allocation is therefore critical for accurate estimation, particularly in high-stakes settings. Item Response Theory (IRT) is a widely recognized mathematical framework for estimating students’ abilities based on their responses. By incorporating question-specific characteristics such as difficulty, IRT enables rational and precise ability estimation. Previous research has proposed an IRT-based score allocation method by aligning grades with estimated abilities. However, a major challenge arises when errors are discovered in exam questions after the results have been released. The common countermeasure of excluding erroneous questions from grading can induce fluctuations in grade rankings, potentially leading to social disruption. To address this issue, we aim to ensure stable grade rankings while preserving a strong correlation between grades and abilities, even in the presence of question errors. We introduce an end-to-end framework for score allocation and design a regularization term that enhances robustness by minimizing ranking fluctuations caused by such errors. Experiments on both synthetic and real-world datasets demonstrate that our method effectively improves robustness against these fluctuations. |
| Christopher Steadman, Stephen Hutt, Margaret Opatz and Panayiota Kendeou | From Diagnostics to Prediction: A Machine Learning Approach to Understanding Reading Development | The recent downward trend in reading proficiency for U.S. students underscores the urgent need for tools that can identify and support students at risk of falling behind. This study examined the extent to which diagnostic assessments of foundational reading skills can predict subsequent changes in performance on a state-level reading assessment, the Minnesota Comprehensive Assessments (MCA). Using BLINDED diagnostic scores across six sub-skills, we trained a multi-class classification model to predict shifts in students’ MCA proficiency levels. Our results show that diagnostic measures of foundational skills, such as morphology and reading efficiency, can provide meaningful insight into students’ future reading trajectories. In addition, growth in word recognition and decoding, a key component of fluent reading, emerged as a reliable predictor of proficiency gains. These findings highlight the importance of assessing and monitoring specific skill development, rather than relying solely on broad outcome measures, to guide instructional decisions. By linking sub-skill diagnostics to state assessment outcomes, this work suggests a path forward for more targeted, data-driven interventions aimed at reversing national trends in declining reading proficiency. |
| Eason Chen, Jeffrey Li, Scarlett Huang, Xinyi Tang, Jionghao Lin, Paulo Carvalho and Kenneth Koedinger | AI Knows Best? The Paradox of Expertise, AI-Reliance, and Performance in Educational Tutoring Decision-Making Tasks | We present an empirical study of how both experienced tutors (expert) and non-tutor participants (novice) judge the correctness of tutor praise responses under different Artificial Intelligence (AI)-assisted interfaces and explanation styles (textual reasoning vs. inline highlighting). Our findings show that although human–AI collaboration outperforms humans alone, it remains less accurate than AI alone. Non-tutor participants tend to follow AI advice more consistently, boosting overall accuracy when AI is correct, whereas experienced tutors often override correct AI suggestions and thus miss potential gains from the AI’s generally high baseline accuracy. Further analysis reveals that textual reasoning explanations increase over-reliance while reducing under-reliance, whereas inline highlighting has no significant effect. Neither explanation style improves accuracy, but both cost participants more time. As a learning analytics contribution, our study introduces reliance patterns (over- and under-reliance) and time cost as process-level indicators that can be systematically logged, analyzed, and used to understand AI-assisted decision-making in education. Our findings reveal a tension between expertise, explanation design, and efficiency in AI-assisted decision-making, highlighting the need for balanced approaches that foster effective human-AI collaboration while offering insights for designing AI-assisted learning analytics frameworks and teacher dashboards for decision support. |
| Conrad Borchers, Manit Patel, Seiyon Lee and Anthony F. Botelho | Disentangling Learning from Judgment: Representation Learning for Open Response Analytics | Open-ended responses are central to learning, yet automated scoring often conflates what students wrote with how teachers grade. We present an analytics-first framework that separates content signals from rater tendencies, making judgments visible and auditable via analytics. Using de-identified ASSISTments mathematics responses, we model teacher histories as dynamic priors and derive text representations from sentence embeddings, incorporating centering and residualization to mitigate prompt and teacher confounds. Temporally-validated linear models quantify the contributions of each signal, and a projection surfaces model disagreements for qualitative inspection. Teacher priors heavily influence grade predictions; the strongest results arise when priors are combined with content embeddings (AUC$\approx$0.815), while content-only models remain above chance but substantially weaker (AUC$\approx$0.626). Adjusting for rater effects sharpens the residual content representation, retaining more informative embedding dimensions and revealing cases where semantic evidence supports understanding as opposed to surface-level differences in how students respond. The contribution presents a practical pipeline that transforms embeddings from mere features into learning analytics for reflection, enabling teachers and researchers to examine where grading practices align (or conflict) with evidence of student reasoning and learning. |
| Zifeng Liu, Linlin Li, Jie Chao, Anupom Mondol, Wanli Xing and Yuanlin Zhang | Do All Roads Lead to AI Literacy? Clustering Behavioral Patterns and Examining Outcomes in an Online AI Literacy Module for Secondary School Students | Artificial Intelligence (AI) literacy is increasingly recognized as a critical competency for K–12 students, yet little is known about how learners engage with AI-focused modules in virtual school contexts. To address this gap, we designed an online narrative-driven AI literacy module (PROGRAM, blinded) that integrates AI learning with Algebra 1. In this pilot study, data from 80 secondary school students who completed the 250-minutes module in three weeks were analyzed, including 117,866 system log records (e.g., submissions, clicks) and pre-/post-surveys on mathematics motivation, AI self-efficacy, and AI literacy. Using K-means clustering, we identified four distinct behavioral patterns: reflective learners, low-revision committers, high-frequency trial-and-error learners, and balanced learners. These groups demonstrated different outcomes: while all clusters showed significant improvement in AI self-efficacy, only some showed notable gains in motivation (i.e., low-revision committers and balanced learners) and AI literacy (i.e., balanced learners). The findings underscore the need for tailored scaffolds to better support varied learning strategies and highlight the potential of the AI literacy module in accommodating diverse learner profiles. |
| Masataka Kaneko and Takeo Noda | How can we analyze learners' thinking during their calculus inquiry based on their observed behavior and the logs of manipulating dynamic geometry? | Although computer-supported inquiry-based learning (IBL) is indispensable for mathematics education, the learning processes have not been fully investigated because of their complexity. As a trial to close the learning analytics (LA) loop in those activities, this study explores a methodology of analyzing learners' inquiry into Maclaurin's series while manipulating dynamic geometry software. A multimodal learning analytics (MmLA) survey was conducted by synchronizing a qualified manipulation log and observed behavior in a collaborative learning environment. Despite the difficulty in the analysis of learners' thinking based only on behavioral data, the results indicate that the change in their pattern of manipulations can give valuable information for segmenting and interpreting the behavioral data. |
| Zifeng Liu, Wanli Xing and Zhihui Fang | Talking the Talk: Linking Instructional Discourse Patterns to Student In-video Dropout and Learning Outcome | While teachers’ discourse is central to shaping student learning in online video-based contexts, little is understood about how specific linguistic features of lectures affect student behaviors and post-video accuracy. This study examines teachers’ discourse patterns in large-scale online mathematics learning environments and their relationships with student dropout during videos and post-video accuracy. A total of 100 videos (25 topics) were collected from four teachers with differing levels of teaching experience (two senior, two junior). Drawing on systemic functional linguistics, discourse features were analyzed using natural language processing, focusing on process types, participant features, logical relations, and interactional markers. Results showed that senior teachers employed more logical extensions and imperatives but fewer interrogative and inclusive clauses than junior teachers. While dropout rates did not differ significantly, students of junior teachers achieved higher post-video accuracy. Regression analyses revealed that dropout was significantly lower in videos with greater use of relational processes. In contrast, post-video accuracy was positively associated with the use of material, mental, and relational processes, as well as elaboration, enhancement, interrogative, and inclusive features. Negative predictors of accuracy included technical terms, complex noun phrases, extension, and imperative forms. |
| Jostein Kleveland, Michail Giannakos and Yngve Lindvig | Exploring Human–AI collaboration for formative feedback: Insights from a K–12 case study and implications for analytics | Generative artificial intelligence (GenAI) can produce natural language and other content to support a wide range of tasks. While much research examines its performance in educational assessment, less is known about how teachers and GenAI can complement each other in providing formative feedback. To address this gap, we present [anonymized], a GenAI tool that generates draft feedback for teachers to adapt, and report findings from a case study with 15 teachers. We analyzed how teachers edited GenAI feedback to provide empirical evidence of its use in K-12 formative assessment. Teachers generally found the drafts high quality and a useful starting point, but refined and personalized them-underscoring the value of human-AI synergy rather than substitution. We discuss implications for learning analytics (LA) and hybrid intelligence (HI), highlighting how teacher expertise and GenAI capabilities can be combined to enhance feedback. |
| Alejandro Ortega-Arranz, Paraskevi Topali, Pablo García-Zarza, Miguel L. Bote-Lorenzo, Juan I. Asensio-Pérez and Inge Molenaar | GenAI Analytics and Pedagogical Configurations: Featuring New Generative AI Systems in Education | The appearance of general purpose Large Language Models (LLMs) and their derived chatbots (e.g., ChatGPT, Gemini) has revolutionized how students learn in high school and university settings. While these chatbots provide potential benefits to students (e.g., 24/7 tutor), they hinder the teachers' autonomy, who are blinded to the questions made by the students and hand-tied to the answers provided by these models. However, such autonomy could be increased with the integration of GenAI Analytics tracking the students' interactions with the chatbots (e.g., which are the most frequently asked topics), and with the configuration and contextualization of such models. This study aims at understanding which GenAI Analytics and Pedagogical Configurations are more useful for university teachers through a GenAI system developed by the authors. The evaluation involved 16 university teachers from 5 different institutions. Results show that analytics such as the number of prompts or the categorization of their contents are of great interest for teachers. Furthermore, while some pedagogical configurations were positively rated by most participants (e.g., not providing the direct solution), some others were not (e.g., forced hallucinations). By better understanding these features of GenAI systems, we expect teachers will increase their autonomy and trust in GenAI-based systems. |
| Sirui Ren, Ha Nguyen, Kathleen Stürmer, Manuel Hopp, Richard Göllner and Tim Fütterer | Evaluating Large Language Model Feedback to Support Teacher Professional Vision: Prompting Strategies and Pre-service Teacher Engagement | Professional vision---the use of knowledge to notice and reason about classroom events---is an important facet of teacher expertise that is challenging for pre-service teachers (PSTs) to develop. Although feedback on written reflections of videotaped classroom events can improve professional vision, delivering feedback at scale requires significant time and labor. We explored the use of large language models (LLMs) to classify professional vision aspects in PSTs' responses and provide feedback. We first conducted a technical evaluation to compare the classification performance of two prompting approaches (single-prompt and prompt-chaining). We then integrated the better-performing approach into a feedback prototype and conducted a user study with 13 PSTs. We analyzed PSTs' interactions with the LLM-generated feedback. Overall, prompt-chaining showed higher inter-rater agreement with human coders in classifying professional vision than the single-prompt approach. PSTs who cycled between reading the feedback and revising reflection text showed more professional vision in the revisions, compared to those who spent more time passively reviewing feedback. Our study illustrates a scalable approach to supporting PST education and highlights considerations about AI feedback uptake. |
| Charles Lang, Yunxi Kong, Geraldine Gray and Jie Gao | The Learning Analytics Value Chain | A persistent challenge for the learning analytics community is how to achieve a meaningful impact on educational practice. This paper proposes that Porter's Value Chain framework provides a useful lens for addressing this challenge. This study developed operational definitions of a Learning Analytics Value Chain and mapped learning analytics research output (3,720 publications) over the last 16 years to its five stages: 1. Concept Development, 2. Prototyping & Efficacy, 3. Evaluation, 4. Dissemination, and 5. Impact & Iteration. SOLAR supported learning analytics venues show heavy concentration in Prototyping & Efficacy (41.5-43.2%) and Evaluation (43.4-53.4%), with minimal Dissemination (0.1-3.8%) and Impact & Iteration (2.8-6.1%) research. Non-SoLAR supported research demonstrates a slighly more balanced distribution, specifically, higher Concept Development (22.5% vs. 0-6.7%) and Dissemination (9.0% vs. 0.1-3.8%) emphasis. These findings suggest that learning analytics has failed to achieve impact because insufficient research examines how innovations reach educators. The 470-fold difference between evaluation (46.7%) and dissemination research (0.1%) represents a critical missing piece of the research landscape. The field has developed strong technical capabilities while neglecting implementation science and scaling strategies. Addressing the impact gap requires portfolio re-balancing toward dissemination research, longitudinal studies, and theoretical development rather than continued technical emphasis alone. |
| Boxuan Ma, Li Chen, Xuewang Geng and Masanori Yamada | Understanding Study Approaches in E-Book Logs and Their Relation to Metacognition and Performance | As digital materials proliferate in higher education, e-book interaction logs provide a scalable lens on how students study. However, most existing research analyzes these logs in a one-dimensional manner, which limits the ability to capture students’ study approaches comprehensively. Moreover, the relationship between students’ study approaches, metacognition, and performance remains unclear. To address these challenges, we propose a three-dimensional framework that incorporates engagement, navigation pattern, and context, combining theory-driven and data-driven perspectives to define behavior-based features and group students with similar patterns. We apply this approach to data from a real-world class in which students used an e-book system to study course materials and complete comprehension quizzes. Our analysis identified three distinct groups of students with different study approaches, and revealed that engagement investment alone does not guarantee achievement. We further examined how these approaches relate to students’ metacognitive awareness and academic performance. The findings provide implications for monitoring and feedback, predicting learning outcomes, and detecting potential issues in content design. |
| Yugo Hayashi, Shigen Shimojo and Kenneth Koedinger | Active Learning Beyond Borders: PEOE Enhancement of Explanatory Understanding in Japanese Undergraduates | Active learning strategies vary in their effectiveness in promoting different types of learning. This study conducted an in vivo experiment comparing the effectiveness of the Predict--Explain--Observe--Explain (PEOE) paradigm and traditional practice questions (PQs) in Japanese psychology undergraduates. The experiment was implemented using Open Learning Initiative (OLI) tools and the DataShop database, which allowed both controlled comparisons and large-scale log analysis. Results showed that multiple-choice accuracy did not substantially differ between conditions, but PEOE yielded notably higher-quality open-ended explanations without greater time on task. Moreover, prediction accuracy in PEOE strongly predicted exam performance. By combining experimental controls with data-driven log analyses, this study demonstrates that PEOE is effective and supports deeper levels of processing in the Japanese cultural context, as reflected in students’ explanatory activities. These findings illustrate how learning analytics can be used to assess the quality of reasoning and explanation. |
| Kejie Shen, Luzhen Tang and Junjie Shang | When Cognitive Offloading Masks Authentic Ability: Using GenAI-Driven Metacognitive Feedback to Support Valid Game-Based Assessment | Mental rotation ability (MRA) is a fundamental cognitive skill linked to success in STEM learning. To assess this ability more effectively, game-based assessments move beyond traditional paper-based tests by capturing behavioural traces that reveal students’ cognitive processes. However, students often rely on trial-and-error and rapid guessing to offload cognition in game-based contexts, potentially compromising assessment validity. This study introduces a game-based assessment of MRA and addresses the offloading issue by embedding GenAI-driven metacognitive feedback. A Flexmix model identified three game strategy groups; notably, Misguided Executors relied on trial-and-error and superficial interactions, making gameplay behaviours fail to capture their underlying MRA. Transition Network Analysis further showed that metacognitive feedback reduced this inappropriate cognitive offloading by fostering students’ more cautious action sequences and appropriate use of external aids. Together, the findings demonstrate how GenAI, by promoting metacognitive monitoring and control, can enhance the validity and effectiveness of game-based cognitive assessments. |
| Sydney Miller and Nigel Bosch | Prompting for Teachability: Designing Novice Personas in LLMs for Learning by Teaching Contexts | Learning by teaching (LbT) is a well-established instructional framework in which students deepen understanding by explaining material to a peer or tutee. Large Language Models (LLMs) create new opportunities to scale LbT by simulating novice learners, but their default tendency toward expert-like responses risks undermining the tutor’s role. This study investigates which prompting strategies most effectively elicit novice-behavior from LLMs in writing-related domains. We generated 30,720 combined prompts across five domains and evaluated three models (Qwen3-235B, Llama 4, Kimi-K2) using both multiple-choice quizzes and short persuasive essays. Outputs were scored on quiz accuracy, essay quality, and essay persuasiveness using an AI-judge rubric. Regression analysis revealed a clear pattern: constraint prompts that explicitly forced error production consistently outperformed persona-, misconception-, and uncertainty-based prompts. Across both quiz and essay outcomes, direct commands to “answer incorrectly” or “get 2-3 wrong” yielded the strongest novice-like behavior, while indirect framings like “don’t aim for a perfect score” or “you may guess” diluted the effect. These findings highly constraint-based prompting as the most reliable strategy, and we argue that constraint directives provide an actionable design pathway for practitioners seeking to integrate LLMs into effective LbT contexts. |
| Mohammadreza Molavi Hajiagha, Mohammad Moein, Mohammadreza Tavakoli, Abdolali Faraji and Gábor Kismihók | Embedding-Based Rankings of Educational Resources based on Learning Outcome Alignment: Benchmarking, Expert Validation, and Learner Performance | As the online learning landscape evolves, the need for personalization is increasingly evident. Although educational resources are burgeoning, educators face challenges selecting materials that both align with intended learning outcomes and address diverse learner needs. Large Language Models (LLMs) are attracting growing interest for their potential to create learning resources that better support personalization, but verifying coverage of intended outcomes still requires human alignment review, which is costly and limits scalability. We propose a framework that supports the cost-effective automation of evaluating alignment between educational resources and intended learning outcomes. Using human-generated materials, we benchmarked LLM-based text-embedding models and found that the most accurate model (Voyage) achieved 79% accuracy in detecting alignment. We then applied the optimal model to LLM-generated resources and, via expert evaluation, confirmed that it reliably assessed correspondence to intended outcomes (83% accuracy). Finally, in a three-group experiment with 360 learners, higher alignment scores were positively related to greater learning performance. These findings show that embedding-based alignment scores can facilitate scalable personalization by confirming alignment with learning outcomes, which allows teachers to focus on tailoring content to diverse learner needs. |
| Kevin Cevallos, Angela Carrera, Xavier Ochoa and Jose Cordova | Evaluating AI-Generated Narrative Feedback on Nonverbal Communication in Student Presentations | Nonverbal communication is essential for effective oral presentations, shaping audience engagement and conveying confidence beyond spoken words. Multimodal Learning Analytics (MmLA) research has advanced the automatic detection of presenters’ behaviors—such as posture, gestures, and eye contact—but continues to explore how to provide actionable feedback that fosters reflection and skill development. Recent advances in Generative Artificial Intelligence (GenAI) offer new opportunities to automate the analysis of nonverbal cues and generate narrative evaluations that go beyond raw metrics. This study investigates the integration of a Learning Analytics Dashboard (LAD) with a Large Language Model (LLM) to deliver both numerical and narrative feedback across five dimensions of nonverbal communication: body posture, eye contact, hand gestures, facial expression, and use of space. Twenty-two undergraduate students participated in the study, receiving numerical feedback through a LAD and narrative feedback from either a human expert or an LLM. Findings revealed no significant differences in students’ perceptions of human- and LLM-generated feedback, while also highlighting the potential of LLM-based feedback to support reflection despite certain technical limitations. These results suggest that LLMs can meaningfully enhance learning analytics dashboards by delivering actionable, human-like feedback that supports the development of nonverbal communication skills. |
| Yufan Zhang, Jaromir Savelka, Seth Goldstein and Michael Conway | Changes in Coding Behavior and Performance Since the Introduction of LLMs | The widespread availability of large language models (LLMs) has changed how students engage with coding and problem-solving. While these tools may increase perceived productivity and local performance, they also increase the difficulty of assessing learning, effort, and actual productivity in computer science education. In this quasi-longitudinal study, we analyze five years of student source code submissions in a graduate-level cloud computing course, spanning semesters before and after the release of ChatGPT. We find that student coding behavior changed significantly since Fall 2022 in at least one assignment that stayed the same: both the length of their submissions, and the edit distances between consecutive submissions, increased, meanwhile positive score deltas between consecutive submissions decreased, suggesting lower productivity and lower code efficiency. Additionally, we found statistically significant correlations between these behavioral changes and their overall performance across assignments. Although we cannot definitively attribute them to LLM misuse, they are consistent with our hypothesis that some students are over-relying on LLMs and it is negatively affecting their learning outcomes. Our findings raise an alarm around the first generation of graduates in the age of LLMs, calling for both educators and employers to reflect on their evaluation methods for genuine expertise and productivity. |
| Angela Stewart and Stephen Hutt | Beyond the Numbers: Socio-Cultural Context as a Reframe for Learning Analytics | Socio-cultural context influences how learners engage with, experience, and make sense of learning. As such, Learning Analytics systems must more intentionally account for this context in their design and interpretation. While many researchers in the field are already addressing these issues, sometimes implicitly or under different frameworks, we argue for a more unified and explicit focus on socio-cultural context as a guiding construct. Drawing on scholarship from education, sociology, anthropology, and psychology, we define socio-cultural context in terms of power, norms, and identity. Through two case studies of our own work, one focused on collaboration analytics and another on engagement analytics, we illustrate how overlooking context can lead to incomplete systems that do not account for the variety of ways students might show up in learning spaces. We build on existing work and propose socio-cultural context as an organizing lens for future research. Doing so enables interdisciplinary collaboration, iterative design, and stakeholder engagement that ultimately ensures analytics better serve all learners. |
| Bailing Lyu, Chenglu Li, Zheng Yao, Hai Li and Rui Guo | How Pedagogical Agents’ Instructional, Cognitive, and Pastoral Conversational Strategies interactively Shape Students’ Learning | Building on evidence of the effectiveness of teachable agents for learning, this study investigated their use of instructional, cognitive, and pastoral conversational strategies—three dimensions of support that learning theories (e.g, Community of Inquiry framework and Self-Determination Theory) identify as interconnected and critical for student learning—to inform the design of pedagogical conversational agents. By analyzing over 8,000 conversations between teachable agents and students, we found that agents’ cognitive and instructional strategies strongly promoted students’ cognitive elaboration, whereas pastoral strategies were associated with surface-level cognitive engagement and higher affective engagement. Moreover, two-strategy combinations (e.g., cognitive + instructional, instructional + pastoral) were generally more effective for fostering cognitive, affective, and metacognitive engagement than any single strategy, while combining all three strategies often weakened effects. Integration of cognitive and pastoral strategies further enhanced procedural knowledge application, whereas instructional strategies supported conceptual knowledge. These results clarify the complementary yet distinct roles of instructional, cognitive, and pastoral strategies and provide insights for pedagogical agents to dynamically pair functional strategies to support student learning. |
| Bailing Lyu, Chenglu Li, Hai Li and Rui Guo | Designing AI Teachable Agents with Personality: Supporting Student Emotions in Mathematics Learning | This study examines how AI-based teachable agents with distinct personality traits influence middle school students’ emotions and mathematics learning. Grounded in the Big Five personality framework, six agents were developed: five designed to emphasize one of the Big Five traits and one without a personality emphasis. Students engaged in teaching the agents to solve mathematical problems. Guided by the Control-Value Theory of Achievement Emotions, students’ emotions were coded by valence (positive vs. negative) and activation (activating vs. deactivating) from their conversations with the agents, while mathematics learning was assessed through coded applications of knowledge during interaction and a post-test. Results showed that agreeableness-emphasis agents elicited both positive activating and negative deactivating emotions (e.g., enjoyment and boredom), whereas conscientiousness- and extraversion-emphasis agents reduced students’ experience of negative activating emotions (e.g., anxiety). Emotional states were further linked to learning outcomes: both positive and negative activating emotions positively predicted knowledge application and post-test performance. These findings highlight the nuanced role of agent personality in shaping students’ emotional experiences and provide design implications for developing teachable agents that effectively support both affective and academic dimensions of mathematics learning. |
| Siqi Shen, Alyssa Friend Wise, Wesley Morris, Langdon Holmes, Scott R. Hinze and Scott Crossley | The Effects of AI Feedback on College Students’ Reading and Writing Performance in an Intelligent Text Framework | This study investigates how AI feedback influences college students’ reading and writing performance within an intelligent text aligned with the Interactive Constructive Active and Passive (ICAP) framework. Using a within-subjects design, students completed two read-to-write tasks — constructed response items (CRIs) and summaries — under two conditions: (1) [blinded condition], which provided AI feedback to support interactive engagement, and (2) Random Reread, a control condition without AI feedback to support constructive engagement. [blinded condition] offered automated scoring of read-to-write tasks, targeted rereading, and a chatbot that engaged students in interactive dialogues to support revision. Results showed that [blinded condition] significantly improved students’ summary revisions and subsequent summary performance, suggesting that AI feedback functions not only as guidance for improving current summaries but also benefits future performance. However, students in the [blinded condition] did not show higher CRI or summative quiz scores. CRI scores may have been limited by a ceiling effect, while the lack of quiz benefits may reflect a misalignment between the system’s learning tasks and assessment. These mixed outcomes highlight the need to further explore how AI feedback may support different aspects of reading with an intelligent text. |
| Jiayi Zhang, Conrad Borchers, Clayton Cohn, Namarata Srivastava, Caitlin Snyder, Siyuan Guo, Ashwin T S, Naveeduddin Mohammed, Haley Noh and Gautam Biswas | Using Large Language Models to Detect Socially Shared Regulation of Learning in Collaborative Learning | The field of learning analytics has made notable strides in automating the detection of complex learning processes in multimodal data. However, most advancements have focused on individualized problem solving instead of collaborative, open-ended problem solving, which may bear both affordances (richer data) and challenges (low cohesion) to behavioral prediction. Here, we extend predictive models to automatically detect socially shared regulation of learning (SSRL) behaviors in collaborative computational modeling environments using embedding-based approaches. We leverage large language models (LLMs) as summarization tools to generate task-aware representations of student dialogue aligned with system logs. These summaries, combined with text-only embeddings, context-enriched embeddings, and log-derived features, were used to train predictive models. Results show that text-only embeddings often achieve stronger performance in detecting SSRL behaviors related to enactment or group dynamics (e.g., off-task behavior or asking for assistance), whereas contextual and multimodal features provide complementary benefits for constructs such as planning and reflecting. Additionally, to support future implementation of these models, we gathered teacher feedback on usability and trust in model outputs. Overall, our findings highlight the promise of embedding-based models for extending learning analytics by enabling scalable detection of SSRL behaviors, ultimately supporting real-time feedback and adaptive scaffolding in collaborative learning environments. |
| Wanruo Shi, Abhinava Barthakur, Vitomir Kovanovic, Jingyi Wang, Lixiang Yan and Xibin Han | Unpacking Co-Creation with AI: Understanding Students’ Creativity in a Game Design Course | The proliferation of Artificial Intelligence (AI) is fundamentally reshaping creative work and education. While research has demonstrated AI’s potential to enhance creative outcomes, the processes underlying human–AI co-creation remain insufficiently explored. This study investigates the dynamics of human–AI co-creation in an undergraduate game design course by leveraging Learning Analytics and Epistemic Network Analysis. Specifically, it examines how students’ creative thinking types (divergent, convergent, and evaluative thinking) interact with their contribution types (creating new ideas, extending ideas, refining ideas, and transforming ideas) during co-creation with AI. Results reveal the collaborative role of different thinking types in the student-AI co-creative process. Divergent thinking primarily drives the generation of new ideas, while convergent and evaluative thinking are essential for refinement and quality control. These cognitive-behavioral relationships evolve across learning phases, shifting from open exploration to focused integration. Furthermore, high-achieving students strategically employ convergent thinking and engage in continuous refinement and transformation with AI, whereas lower-achieving students tend to remain at the exploratory stage and rely on initial AI outputs. Practically, the findings underscore the importance of supporting students in shifting between creative thinking modes, promoting critical evaluation, and offering phase-specific guidance to facilitate effective and meaningful co-creation with AI in educational settings. |
| Gabriel Astudillo, Isabel Hilliger and Jorge A. Baier | Unveiling Teaching Practices in Higher Education: Leveraging Open-Ended Student Comments in Learning Analytics | Open-ended comments in teaching evaluation surveys offer rich insights into how teaching and learning processes unfold. However, large-scale analyses often rely on superficial labels, limiting their usefulness for designing effective interventions. Current Learning Analytics research has yet to fully leverage this data source to inform teaching development. In this study, we present the design and validation of a coding instrument for labeling effective and malfunctional teaching practices, based on 44 practices systematized from prior research. A panel of five experts from different parts of the world validated the content. For construct validation, two trained annotators applied the refined instrument to label 2,618 open-ended responses, and we used Exploratory Multidimensional Item Response Theory to iteratively prune the practices considered. The resulting instrument retained 21 practices linked to conceptually substantive dimensions of teaching, such as Teaching and assessment consistency, Clear content transmission, Breakdown of constructive alignment, and Lack of pedagogical support, among others. Our work contributes to LA by introducing a novel empirical data source—student-generated open-ended feedback—and a methodological framework for extracting actionable insights to support faculty development and improve teaching quality. |
| Jie Gao, Shasha Li, Jianhua Zhang, Shan Li and Tingting Wang | Investigating Self-regulated Learning Sequences within a Generative AI-based Intelligent Tutoring System | A novel educational technology trend that uses generative artificial intelligence (GenAI) techniques to support learning has been widely adopted in education. Detecting learners’ self-regulated learning (SRL) through this learning process plays an important role. However, little research has captured students’ SRL patterns when using GenAI for learning within AI-based systems. In this study, we analyzed the trace data from a GenAI-based intelligent tutoring system to examine students’ interaction patterns with GenAI. We also explored students' purposes in using GenAI through the learning process. Participants were classified by using the sequential and clustering analysis into two groups based on their SRL sequences. The difference between groups was mainly in the frequency of use of GenAI. Meanwhile, we found that most of the students used GenAI for information acquisition rather than information transformation. The correlation between the purpose of using GenAI and their learning performance was not statistically significant. Our findings have implications for both pedagogy and AI-based learning environments design. |
| Tianyi Wang, Marek Hatala, Arash Ahrar and Zhiqing Zheng | Linking Dashboard Elements and Retrospective Outcome Emotions in Dashboard Feedback: A Transmodal Analysis | Learning analytics dashboards (LADs) are widely used to present information about learners’ outcomes. However, few studies have examined how specific LAD elements, particularly those involving varying degrees of social comparison, contribute to students’ retrospective outcome emotions of pride, relief, disappointment, and shame. This study applies a transmodal analysis of eye-tracking data to identify LAD elements critical to these emotions. Using data from 96 participants viewing dashboards that presented their assignment performance, we demonstrate how students’ gaze focus differed between those who experienced different retrospective outcome emotions and those who did not. Our findings suggest directions for future research on developing personalized, emotion-aware learning environments. |
| Joel Weijia Lai, Shen Yong Ho and Fun Siong Lim | Learning Analytics for Assessment Preparation: Constructing Graphs to Guide Higher-Order Question Generation | High-quality assessment at scale requires questions that are conceptually rich, appropriately difficult, and cognitively engaging. We present a learning analytics pipeline that transforms archival assessments and learner performance data for question generation. Past assessment questions from an undergraduate engineering physics course are multi-labelled with topics from a predefined syllabus, and a primary large language model (LLM) is selected via inter-rater reliability against expert judgments to ensure label quality. Using these labels and student performance data, we construct an undirected weighted graph where nodes represent topics and edges capture both co-occurrence and difficulty, with student performance used as a proxy. Exploring this graph reveals difficulty-calibrated topic sets that can be retrieved around target topics. We then evaluate LLM-generated questions for difficulty and alignment with the Structure of Observed Learning Outcomes taxonomy when provided with the requested topics. Empirically, we found that GPT-5 Thinking shows higher agreement with expert labels. We further found that the generated questions, grounded by the topic-performance graph, were deemed to be more difficult and of higher-order intent under blinded expert review. The contribution is a scalable creation of higher-order assessment items with appropriate difficulty that are imperative for generative personalised learning and assessment systems. |
| S.M.S.P Samaraweera, Linxuan Zhao, Vanessa Echeverria, Riordan Alfredo, Guanliang Chen, Joy Davis, Shera Leonny, Samantha Sevenhuysen, Cliff Connell, Dragan Gasevic, Roberto Martinez-Maldonado and Anuja Dharmaratne | From Formal Learning to Professional Practice: Automated LLM-based Coding and Visualisation of Team Dialogue in in-situ Healthcare Simulation | Simulation-based learning is central to healthcare education, yet its effectiveness depends on high-quality debriefing. Traditional debriefs often overlook detailed team dialogue dynamics. Advances in large language models (LLMs) open new possibilities for learning analytics (LA) by automatically coding and visualising teamwork behaviours from dialogue data. This study investigates the effectiveness of different prompting strategies for LLM-based coding, comparing their performance and environmental impacts (CO2e) to identify approaches suitable for transfer into professional practice. Building on these results, we evaluate the generalisability of the optimised model from university student simulations to in-situ, hospital settings, and explore how healthcare professionals perceive the interpretability, usefulness, and trustworthiness of LLM-driven learning analytics in professional learning debriefs. Findings illustrate that responsible uses of AI can help extend LA beyond a controlled university environment into an authentic, in-hospital healthcare context, offering potentially scalable and sustainable support for reflective practice and professional development. |
| Elham Tajik, Bahar Shahrokhian, Conrad Borchers, Sebastian Simon, Ali Keramati, Sonika Pal and Sreecharan Sankaranarayanan | Disagreement as Data: Reasoning Trace Analytics in Multi-Agent systems | Learning analytics researchers often analyze qualitative student data such as coded annotations or interview transcripts to understand learning processes. With the rise of generative AI, human–AI collaborative workflows have emerged as promising methods for conducting these analyses. However, methodological standards to guide such workflows remain limited. In this paper, we propose that reasoning traces generated by large language model (LLM) agents, especially within multi-agent systems (MAS), constitute a novel and rich form of process data to enhance interpretive practices. We present a proof-of-concept study using cosine similarity on reasoning traces to systematically detect and interpret disagreements among agents, reframing disagreement as a meaningful analytic signal rather than noise. Analyzing nearly 10,000 agent pairs, we show that reasoning similarity robustly differentiates consensus from disagreement and correlates strongly with human coding reliability. Qualitative analysis reveals nuanced instructional sub-functions within codes and fuzzy boundaries in the codebook, offering pedagogical insights that support instructional analysis and codebook refinement. By integrating quantitative similarity metrics with qualitative review, our method improves coding reliability, surfaces interpretive ambiguity, and supports human–AI co-analysis workflows. We argue that reasoning-trace disagreements represent a valuable new class of analytic signals advancing methodological rigor and interpretive depth in educational research. |
| Oleksandra Poquet, Sila Salta, Louis Longin and Olga Viberg | It Depends: Individual Traits and Contexts in Learner Perceptions of LA Technologies | Most modern educational technologies rely on learner data to provide feedback and enable personalisation. Decision to adopt and use such technologies, in part depend on learner privacy perceptions related to data sharing. Some privacy theorists posit that these perceptions are shaped by psychological characteristics, while others argue that they are driven by contextual details. In learning analytics (LA), which heavily depends on learner data, little work has systematically examined whether learner characteristics, such as demographics and privacy concerns, predict student perceptions of frequently deployed LA-driven educational technologies, and to what extent contextual factors may also play a role. To investigate this, we conducted a vignette-based experiment (N = 256) asking participants to evaluate acceptability and intended personal use of LA tools across systematically manipulated scenarios. Our analysis showed that demographics and privacy concerns predicted adoption-related perceptions, and that individual and contextual factors explained comparable variance. By quantifying the contributions of individual and contextual factors to adoption-related perceptions, our study identifies the design and governance scaffolds needed for ethical data-sharing. We recommend that LA tool designers customise data-sharing choices, and that local LA policies embed participatory, context-aware processes that help learners interrogate context and revise their data-sharing and adoption decisions accordingly. |
| Namrata Srivastava, Megan Humburg, Sarah Burriss, Shruti Jain, Clayton Cohn, Yeojin Kim, Umesh Timalsina, Joshua Danish, Cindy E. Hmelo-Silver, Krista Glazewski, James Lester and Gautam Biswas | The Role of LLM-Powered Conversational Agents in Supporting Inquiry in Narrative-Centered Learning Environment: A Learning Analytics Study | Problem-based learning (PBL) environments increasingly embed LLM-based conversational agents (CAs) to scaffold inquiry, yet little is known about how learners actually respond to these agents in authentic classroom settings. Learning analytics offers powerful opportunities to capture and interpret how students engage with these agents, enabling deeper understanding of their inquiry processes and informing more adaptive instructional support in PBL settings. In this paper, we examine students' interactions with three types of LLM-powered CAs -- Content Knowledge, Argument Feedback, Argument Evaluation -- designed to provide distinct forms of inquiry support within a narrative-centered learning environment, SciStory: Pollinators. Using Pedaste et al.’s inquiry cycle as a lens, we used contextualized log data from 15 student groups to analyze how these agents shaped inquiry via sequence analysis of students' coded actions. Our results revealed distinct trajectories of agent episodes and suggest LLM-powered CAs can play complementary pedagogical roles -- supporting information seeking, guiding revision, and prompting reflection -- but may also channel inquiry in ways that constrain exploration. We discuss the implications of using learning analytics to design adaptive scaffolds and using contextualized log analysis to capture how learners navigate inquiry with AI support in authentic classroom settings. |
| Yerin Kwak, Robin Schmucker and Zach Pardos | Strengthening Course Transfer Pathways Using Graph-Theoretic Articulation Networks | Academic pathway research investigates how students navigate their postsecondary education over time to support academic attainment and equitable learning experiences. In this context, upward transfer from two-year to four-year institutions is critical for bachelor’s degree attainment, yet the process of manually establishing course equivalencies (i.e., articulation) to facilitate this transfer remains labor-intensive. This study contributes systematic evaluations of how existing human-curated articulation agreements can be leveraged for data-assistive course-to-course articulation within an interpretable graph-theoretic framework. Specifically, we construct Articulation Networks using articulation data between California community colleges and universities to identify and rank candidate course equivalencies based on node-similarity measures. Analyzing data from over 56,000 courses, we find that among the evaluated similarity measures, Personalized PageRank achieves the highest accuracy-approaching the theoretical recall upper bound imposed by the graph structure-and outperforms text-similarity baselines. Further, we demonstrate how the network approach can integrate additional information, such as the Course Identification Numbering (C-ID) system, to improve course equivalency recommendations and support the assignment of appropriate C-ID designations for community college courses. Our findings highlight how network-based methods can serve as a valuable resource to support faculty and policy makers in streamlining course-equivalency decisions and strengthening pathways for transfer students. |
| Carlos Eduardo Paulino Silva, João Pedro Medrado Sena, Julio Cesar Soares dos Reis, André Gustavo dos Santos and Lucas Nascimento Ferreira | Modeling Programming Skills with Source Code Embeddings for Context-aware Exercise Recommendation | In this paper, we propose a context-aware recommender system that models students’ programming skills using embeddings of the source code they submit throughout a course. These embeddings are used to predict students’ skills across multiple programming topics, producing profiles that are then matched against the skills required by unseen homework problems. To generate recommendations, we apply cosine similarity between student profiles and problem skill vectors, ranking exercises according to alignment with each student’s current abilities. We evaluated our approach using real data from students and exercises in an introductory programming course at our university. First, we assessed the effectiveness of our source code embeddings for predicting skills, comparing them against token-based and graph-based alternatives. Results showed that Jina embeddings outperformed TF-IDF, CodeBERT-cpp, and GraphCodeBERT across most skills. Additionally, we evaluated the system’s ability to recommend exercises aligned with weekly course content, analyzing student submissions collected over seven course offerings. Our approach consistently produced more suitable recommendations than baselines based on correctness or solution time, suggesting that predicted programming skills provide a stronger signal for problem recommendation. |
| Yaolan Weng | Stochastic Simulation of AI Disclosure Policies: Fairness and Norm Internalization in Learning Analytics | This paper addresses the emerging challenge of AI usage disclosure in student assignments, an increasingly important issue for academic integrity and fairness in higher education. We propose a stochastic simulation framework to explore how different disclosure policies (lenient, neutral, strict, and restorative) influence student decisions, norm internalization, and subgroup fairness over multiple assignment rounds. Our model integrates probability-based student decision-making, dynamic updates of perceived detection risk (π) and norm internalization (ρ), and heterogeneity across student groups (e.g., disclosure friction, false-positive rates). Results reveal that endline disclosure rates converge across policies, but policy design meaningfully differentiates longer-run outcomes: restorative (education-oriented) policies lead to substantially stronger norm internalization, higher average performance, and narrower fairness gaps, whereas strict (punitive) policies produce weaker norms, lower performance, and the widest gaps. This work contributes a methodological demonstration of stochastic modeling for learning analytics, highlighting both pedagogical implications and policy design trade-offs. |
| Xinyue Jiao, Zifeng Liu, Sijie Mei and Su Cai | Behavioral and Emotional Dynamics of AR-supported Collaborative inquiry | With the increasing integration of Augmented Reality (AR) in education, learners can investigate scientific phenomena through embodied and interactive experiences, while collaborating in shared perceptual spaces. Although prior research has highlighted the conceptual and motivational benefits of AR, less is known about how students’ behavioral and emotional processes unfold during AR-supported collaboration. This study investigates 80 middle school students’ collaborative inquiry in an AR-based collaborative activity using a multimodal learning analytics approach. We combined video-based coding of verbal and non-verbal behaviors with audio-based emotion detection across three dimensions: arousal, dominance, and valence. Cluster analysis revealed three distinct collaboration patterns, which we further examined in relation to students’ emotional trajectories and learning outcomes. Findings show that groups in the reciprocal–balanced pattern engaged in active and coordinated behaviors, accompanied by more synchronized trajectories, and achieved higher learning gains. Dominant groups displayed high activity but uneven participation and imbalanced emotional dynamics. By contrast, passive–disengaged groups demonstrated limited coordination and unstable affect. This work advances understanding of how collaboration unfolds in AR-supported inquiry by linking behavioral and affective dimensions. Our results provide implications for the design of AR learning environments and analytics-driven supports for productive and emotionally balanced AR learning experiences. |
| Amar Lalwani | From Intra-Unit Abort to Completion: Optimizing Engagement in Low-Resource Classrooms | We present a field-deployed study on reducing within-unit abort behavior in a low-resource EdTech setting, using a personalized recommendation strategy grounded in learner interaction history. Conducted on platform ABC, a digital learning platform aligned to Kenya’s pre-primary curriculum, the experiment compares a model-driven unit recommender against a fixed content sequence. The recommender estimates completion probabilities from each learner’s recent gameplay using a CNN-based time series classifier, and selects the next unit to optimize both immediate and downstream success. Across 90,000+ learners, we observe that the treatment group plays 41% more units, spends 24% less time per unit, aborts 3.1 %-points fewer units, and achieves a 7.7 %-point higher score. Mediation analyses suggest that the increase in engagement is primarily driven by better-aligned content rather than easier tasks, and that improved performance mediates the reduction in aborts. These findings highlight the potential of real-time predictive personalization to improve short-term learner engagement, an important precursor to foundational literacy and numeracy outcomes in low-resource settings, without compromising learner autonomy or curriculum fidelity. Moreover, abort behavior, while an imperfect proxy, can be interpreted through learning theory constructs such as cognitive load, self-regulation, and productive struggle, providing a bridge between algorithmic personalization and educational insight. |
| Jaeyoon Choi, Mohammad Amin Samadi, Spencer Jaquay, Seehee Park and Nia Nixon | Read the Room or Lead the Room: Understanding Socio-Cognitive Dynamics in Human-AI Teaming | Research on Collaborative Problem Solving (CPS) has traditionally examined how humans rely on one another cognitively and socially to accomplish tasks together. With the rapid advancement of AI and large language models, however, a new question emerge: what happens to team dynamics when one of the ``teammates" is not human? In this study, we investigate how the integration of an AI teammate -- a fully autonomous GPT-4 agent with social, cognitive, and affective capabilities -- shapes the socio-cognitive dynamics of CPS. We analyze discourse data collected from human-AI teaming (HAT) experiments conducted on a novel platform specifically designed for HAT research. Using two natural language processing (NLP) methods, specifically Linguistic Inquiry and Word Count (LIWC) and Group Communication Analysis (GCA), we found that AI teammates often assumed the role of dominant cognitive facilitators, guiding, planning, and driving group decision-making. However, they did so in a socially detached manner, frequently pushing agenda in a verbose and repetitive way. By contrast, humans working with AI used more language reflecting social processes, suggesting that they assumed more socially oriented roles. Our study highlights how learning analytics can provide critical insights into the socio-cognitive dynamics of human-AI collaboration. |
| Lei Tao and Yanjie Song | Toward Just-in-Time Scaffolding: Predicting High-Risk Moments in iVR Speaking Tasks | Immersive virtual reality (iVR) provides authentic contexts for second-language speaking, yet learners often encounter high-risk moments (HRM) such as prolonged pauses, repeated help-seeking, or error–help loops disrupting self-regulation. Although scaffolding is widely used to address these difficulties, little is known about when support should be triggered in iVR. This study examined whether sequential models can anticipate HRM in a GenAI-supported iVR speaking task and which behavioural signals best explain these predictions. Event logs from 80 undergraduates interacting with a digital human were coded into 14 self-regulated learning (SRL) categories. HRM were operationalised through repeated help-seeking, error–help cycles, and inactivity beyond 10 seconds. Predictive features were derived from sliding windows of prior SRL events and tested with logistic regression, Markov chains, long short-term memory (LSTM) networks, and temporal convolutional networks (TCN). Results showed that the TCN achieved the strongest performance (PR-AUC = 0.60; ROC-AUC = 0.76), successfully flagging most HRM. SHAP and TimeSHAP analyses revealed interpretable precursors including help bursts, entropy collapse, and long pauses. Findings illustrate a decision-aligned learning analytics pipeline that links prediction with interpretability and scaffolding design. By translating predictive signals into operational rules, the study highlights how learning analytics can support just-in-time interventions in iVR language learning. |
| Luis P. Prieto and Cristina Villa-Torrano | Multi-Method Triangulation in Idiographic Socio-Emotional Learning Analytics: A Case in Doctoral Education | Doctoral education faces persistent well-being and dropout issues. A central challenge is the absence of cohorts (learner uniqueness), which limits the effectiveness of broad support interventions. Advances in learning analytics (LA) and artificial intelligence (AI) offer potential for personalized support, yet doctoral education and socio-emotional learning (SEL) remain largely overlooked in LA/AI research. Early efforts using idiographic analytics in this direction have encountered methodological and practical challenges. This paper proposes a multi-method idiographic approach for analyzing longitudinal mixed (structured/unstructured) data and demonstrates its utility through a case study of authentic diary and log data in doctoral education, focusing on students’ perception of progress as a key motivational and socio-emotional construct. Our results show that integrating longitudinal emotional and contextual cues (via deductive qualitative content analysis) enhances the reliability and actionability of idiographic analyses. These analyses can provide valuable feedback to doctoral students, given that their graphical representations remain challenging for non-expert interpretation. |
Practitioner Reports
| Authors | Title | Abstract |
| Maria Janelli | Centering Youth Voices in the Design of Creative AI: A Practitioner Approach from Scratch | As AI becomes increasingly embedded in educational tools, Scratch is charting a path that combines robust technical infrastructure with human-centered design to support creativity, foster connection, and center the experiences of young people in the evolution of learning technologies. This presentation showcases the development of the Creative Learning Assistant (CLA), a semantic search and recommendation system designed to support creative learning through community-powered AI. Built on the world’s largest dataset of youth-created projects, the CLA goes beyond pattern recognition to engage with what makes learning feel intuitive, joyful, and meaningful. We share how we blend billions of behavioral datapoints, diverse community contexts, and youth-informed testing to design AI that is attuned to how kids learn and what they value. The CLA reflects a broader commitment to ensure that AI for children is not only technically sophisticated, but creatively generative, socially supportive, and shaped by the voices of those it serves. |
| Jasper Naberman, Merel Das, Matthieu Brinkhuis and Sergey Sosnovsky | Targeted Practice in an Online Primary School Platform: A Large-Scale Embedded Evaluation | Online learning platforms are in need of scalable, non-intrusive ways to demonstrate their learning impact. This paper presents and attempts to address this problem by conducting an embedded quasi-experimental study that demonstrates how on-topic practice improves students’ subsequent performance using weekly personalized review quizzes, which resurface items a student previously missed. In SYSTEM, a commercial Dutch practice platform for primary-school students, we compare performance after on-topic practice with off-topic practice in samples matched on engagement covariates, and analyze accuracy with a generalized linear mixed model. From a dataset of 7,154 students (38,638 answers), on-topic practice yielded modest but consistent gains in Mathematics (72% vs. 69%, p < .001) and Language (73% vs. 70%, p = .03), with no significant effects in Spelling & Grammar or Reading Comprehension. This paper offers a reusable blueprint for continuous, actionable and non-intrusive evaluation of learning effectiveness in production environments. |
| Logan Paul and Alexis Peirce Caudell | From Data to Dialogue: Using AI to Scale Feedback-Informed Learning | We present an initiative to enhance a peer-feedback analytics system supporting hundreds of students in a large, project-based informatics course. Building on prior work that streamlined the collection, exchange and visualization of peer feedback at scale, the project integrates artificial intelligence (AI) to synthesize thousands of qualitative comments into actionable insights for both instructors and students. The initiative utilizes a large language model (LLM) to summarize and contextualize peer feedback. The model aids instructors in identifying course, team, and individual feedback themes and provides students with individualized reflective summaries and nudges that foster durable-skill development. We describe the design of the AI-supported feedback system, early lessons from pilot implementation, and considerations for privacy, interpretability, and responsible use. By doing so, the project provides a blueprint for how AI insights can inform educational decisions, cultivate reflective practice and boost feedback literacy across high-enrollment courses. |
| Fei Gao | Fostering Autonomy in Online Mastery-Based Math Learning | This study reports on the design, integration, and evaluation of motivational features in STEM Fluency, an online mastery-based system for strengthening basic math skills among STEM students. Although the system has demonstrated positive effects on skill mastery, prior research found that students often lacked motivation, engaging only when required. Grounded in Self-Determination Theory, we enhanced the system with features supporting autonomy, including a series of short motivational videos to provide meaningful rationale, and a learning analytics dashboard (LAD) and a self-created practice option to support meaningful choice. A quasi-experimental study was conducted with 694 undergraduate physics students across four sections, with two sections using the enhanced system and two using the original version. Results showed that students in the enhanced system reported significantly higher intrinsic motivation and marginally higher autonomy, though no differences were found in learning outcomes. Usage data indicated limited engagement with the LAD and self-created practice feature, suggesting that motivational gains may be largely attributed to video exposure. Design reflections highlight the importance of accessibility of motivational features. A new dashboard page was developed to increase direct access and sustained engagement, and its effectiveness will be evaluated. |
| Camille Kandiko Howson and Debananda Misra | The impact of AI on collaboration and belonging on the student experience | This presentation explores how the use of Artificial Intelligence (AI) for teaching and learning in STEM higher education shape students’ collaboration, learning, and belonging. It adopts a bottom-up and community driven approach, focussing on students’ regulation and agency towards an independent and collaborative learning in the AI age. The theoretical basis draws on self-regulation; co-regulation; and socially-shared regulation learning (Zimmerman, 2002), which includes dimensions of governance, cognitive, social/belonging, and pedagogy. The project brings together students from [universities in two countries] to develop scenarios, use cases and resources, and capture learning analytics from using AI in STEM teaching and learning. The study is based around AI Collaboration Hackathons and follow-up workshops at [two institutions]. Each hackathon has three streams: 1) To propose AI-apps to support student collaboration, 2) To design AI-enabled scenarios to support student collaboration, and 3) To capture analytics from collaborative AI usage. |
| Sanaz Nikfalazar, Manika Saha, Ee Hui Lim and Sulaiman Al Sheibani | Redesigning Learning: Lessons from Embedding Agile Pedagogy and Learning Analytics in Unit Transformation | This paper presents reflections from a large-scale undergraduate Information Technology (IT) unit transformation that embedded agile pedagogy and learning analytics to enhance student engagement, inclusivity, and authentic learning. Transitioning from a traditional lecture/tutorial model to a workshop-based, agile-informed design, the unit adopted iterative “micro-teach + do” sessions, authentic project management assessments, and continuous feedback loops through weekly pulse surveys and analytics dashboards. These mechanisms enabled rapid pedagogical adjustments and evidence-driven curriculum refinement. Analysis revealed analytics-informed interventions produced measurable improvements in both engagement and student satisfaction by semester’s end. Results demonstrated notable improvements in engagement, clarity, and satisfaction, with sustained positive trends across SETU indicators from 2022 to 2024. Embedding agile principles fostered adaptability, collaboration, and co-ownership among students and staff, while inclusive design strategies supported diverse learning needs. The case demonstrates how data-informed, agile pedagogy transforms large foundational courses and strengthens practice-ready capabilities. |
| Jasper Naberman, Merel Das, Matthieu Brinkhuis and Sergey Sosnovsky | Platform-Embedded Measurement of Online Learning: Evidence From a Question Order Swap Experiment | Online learning products could benefit from scalable, non-intrusive ways to assess effectiveness during real use. In an online education platform, we embedded a minimal position swap in question sets. In a randomized controlled trial, the first and last questions swapped places to measure potential learning within practice sessions. We modeled binary response correctness with a generalized linear mixed model including random effects for student and content. Based on 53,035 answers (6,337 students), the same questions were more often correct at the end than at the start (first-regular: 35% vs. 32%, OR = 0.89, p = .005; last-regular: 24% vs. 21%, OR = 0.85, p < .001). The effect (~3 %-points) is modest but robust under real-world circumstances and requires no extra testing burden. Thus, this methodology is presented as a blueprint for continuous, in situ evaluation of learning. |
| John Fritz, Thomas Penniston, Josh Abrams, Sarah Bass, Tara Carpenter, Suzanne Braunschweig and Nancy McAllister | Using GenAI & Analytics to Nudge STEM, Non-STEM and At-Risk Students’ Responsibility for Learning | This case study examines how four courses have pursued a common pedagogy of cultivating students' metacognition through GenAI and analytics. ● University A’s largest two courses (Course AA & Course BB), each with 800+ students, are using AI to create a “24/7 prof” and formative learning environment, based on “spaced practice” to counter ineffective student cramming for exams. While effective, students struggle to replicate these strategies on their own in subsequent courses. ● A lab science course for non-STEM majors (Course CC), with 600+ students, asks them to create their own practice questions AND answers for extra credit. While successful, AI could streamline the curation of these student-crowdsourced study guides, which could be key to scaling faculty adoption. ● A smaller course for students on academic probation (Course DD), with 20 students per section, uses AI to inform a team-based extra credit practice environment for weekly quizzes. This also helps students form effective study groups, a skill valued by faculty but difficult to implement, especially for at-risk students. In each case, the goal is to nudge students’ ability and willingness to honestly and accurately assess what they currently know, understand and can do, either alone or (maybe ideally) together. |
| Marcy Baughman and Guido Gatti | Learning Analytics + Artificial Intelligence ⇒ Truly Personalized Instruction For All | (Practitioner Presentation) We are currently developing several new student-facing applications, and this presentation will describe the learnings from the developers’ point of view for these efforts. The initiative starts with the premise that personalized Instruction (PI) is necessary for students in reaching their full potential. Recent breakthroughs in Artificial Intelligence (AI) have made PI scalable to the point that applications with true PI features can now be made intrinsic to the learning experience and available to every student. Following the learnings and success from our recently developed AI Tutor feature, we believe the next gamechanger will come from incorporating student and course analytics ubiquitously into the personalization-AI process. Simply stated, just as the instructor cannot effectively personalize instruction without taking the time and energy to know the student personally, digital applications must constantly be collecting relevant data for each student, and have that data accessible in real time. Such data will include student interactions with assignments and features, performance, preferences/proclivity, as well as, reporting progress, needs, and priorities to both students and their instructors. |
| August Evrard, Nathan Lesny and Mark Mills | Practice Makes Better: Grade Benefits of STEM Study in the Wild | Practice study has long been recognized as a beneficial approach to learning and competency development in STEM subjects. Utilizing records of 40,000+ students over 12 semesters on a local practice study service called Problem Roulette, we examine how term-long practice study volume correlates with final grade in eight STEM courses across five subjects. Conditioning on standardized pre-college math test scores, we find that students in the highest study volume quintile earn consistently higher average grades than their counterparts who practice less. Students with low pre-college test scores who attain this high level of practice improve their odds of earning an A-level grade by a factor of two or more. |
| Christian Rua, Erin Ottmar and Siddhartha Pradhan | MathFlowLens Teacher Dashboard: Using Learning Analytics to Gain Actionable Insights About Strategic Thinking in Mathematics | This practitioner report presents the iterative development and real-time integration of the (blinded) dashboard, a data analytics and visualization tool for mathematics teachers that helps identify their students’ strategic thinking and problem-solving pathways. Development by researchers was guided by a co-design process where middle school teachers provided mockups, feedback and usability data that were analyzed and turned into new analytics and features that met their instructional needs. Focus groups provided insights into the design of a new data visualization component that allows teachers to see each individual student’s problem attempts and their complete process of solving a problem from start to finish, as well as an integration into the (blinded2) platform. Results from two classroom studies will help identify the ways that the integrated dashboard tool can provide teachers with real-time actionable data, prompting questions, and support to better facilitate mathematical discourse in the classroom about students strategic thinking. |
| Oluwaseun Philip Oluyide, Felix Kayode Olakulehin, Adewale Adesina, Segun Buhari, Christine Ofulue and Paul Prinsloo | The (in)visibility of learning: A Collaborative Reflective Case Study of Learning Analytics at National Open University of Nigeria | This collaborative reflective case study examines the realities of implementing LA at the National Open University of Nigeria (NOUN). While NOUN possesses substantial amounts of student digital data, its analytical potential is unrealised due to, inter alia, open registration and a pedagogical strategy that includes a range of voluntary (a)synchronous pedagogical interventions and submission of formative assessments. Though pass rates in the summative assessment show evidence of learning, there is no meaningful data during the semester to provide insights into students’ learning. The study concludes that for LA to play a supportive and impactful role in student learning and faculty teaching, NOUN must adopt a data-informed and data-led pedagogical strategy to make learning visible. For institutions in the Global South, this case study provides an empathetic account of the unrealised potential of LA and provides a rationale for careful, data-informed pedagogical redesign. |
| Simon Frank and Matthias Ehlenz | Supporting Educators through AI: Human–AI Collaboration in Authoring Immersive 360° Learning Experiences | Creating high-quality immersive learning experiences is time-consuming and technically demanding, even with modern authoring tools. This practitioner report introduces an AI-assisted extension of [blinded], an open-source web-based authoring tool for interactive 360° tours. The extension integrates a network of Large-Language-Model (LLM)-based agents to support educators during the authoring process. The system automates research tasks, suggests pedagogically relevant hotspots, and generates textual or interactive content directly within the virtual scene. To address challenges in visual scene understanding, a set of vision-based tools was implemented that enables AI agents to interpret equirectangular 360° panoramas, detect spatial relations, and identify object positions for interactive elements. This work demonstrates how human–AI collaboration can reduce teachers’ preparation time while maintaining high educational quality. It highlights opportunities for capturing interaction traces that can inform future learning analytics research. Lessons learned include the importance of transparency, user control, and streaming-based feedback for maintaining trust among educators. |
| Seiichiro Aoki, Shinzo Kobayashi and Gary Hoichi Tsuchimochi | Practice of A Course Designed by The ICE Approach with Flipped Classroom at A Japanese University Using Digital Mandala and Tracking of Learning Process | We aim to clarify instructional design methods to realize courses that promote proactive learning. We employed the ICE approach as a learning process, and defined one manifestation of a proactive learning attitude as the transition from I (Ideas) to C (Connections) or E (Extensions) in the sentences in which students organized their thoughts. To encourage the transition, we designed a flipped classroom based on the ICE approach for a foundational astronomy course at a Japanese university, implemented from the Spring 2023 semester through the Fall 2024 semester. To allocate class time to group work, presentations, and discussions that would facilitate the transition, the knowledge acquisition phase was designated as pre-learning. Students used our co-creation platform, digital Mandala, to help them organize their thoughts. We fine-tuned the BERT model to track the learning process by measuring transitions from I, C, and E. As the additional fine-tuning dataset, we classified sentences organized by students in digital Mandalas for several themes during pre-learning, group work, and post-learning into I, C, and E categories. We obtained a model with approximately 81% accuracy. However, as the accuracy is still insufficient, we plan to expand the dataset and consider using other pre-trained models with many hyperparameters. |
| Tor Einar Møller and David J. S. Blind | A story of institutional learning analytics at a Norwegian business school | In this practitioner report, we describe some of the key decisions made to develop, implement, and maintain an institutional and scalable learning analytics solution at a Norwegian business school. In this presentation, we highlight the key technical and organizational measures taken to mitigate risk and tackle privacy concerns, and how the placement of our learning analytics capacity has influenced how we work strategically against other stakeholder groups. Our context includes limited, but constant line capacity as opposed to a project-oriented capacity structure. The aim of the paper is not to highlight specific products or services enabled by our initiative, but rather to describe one way in which learning analytics can be institutionalized in practice, and as such hopefully inspire others. |
| Jim Goodell, Jeanne Century, Anna Lindroos Čermáková, Beth Havinga and Brandt Redd | Bridging Learning Analytics Research-Practice Gaps with Learning Engineering and International Learning Technology Standards | Despite advances in Learning Analytics (LA) research, including development of sophisticated AI models, gaps in technical interoperability and technology translation often prevent scaled practical application in education and training. This paper proposes the Modular Open eXplainable Interoperable Enablement (MOXIE) Framework as a strategic blueprint for scaled learning analytics deployment. This systematic approach aims to ensure LA innovations are reliably, ethically, and effectively integrated into diverse learning environments for widespread benefit. This paper examines gaps preventing scaled practical application of research-informed learning analytics methods and models, including AI models and methods. It then presents a theoretical framework for bridging those gaps with Learning Engineering (LE) methodologies and international learning technology standards. This systematic approach aims to ensure LA and research-backed AI models are reliably, ethically, and effectively integrated into diverse learning environments for widespread benefit. |
| Nicola Wilkin, Mark Hancock, Alison Irvine, Melissa Loble, Sally Adams, Jess Cooper, Paul Franklyn, Andrew Quinn and Richard Zana | Synthesising Student Feedback with GenAI: Enhancing LA Platforms to Address UK Higher Education Regulatory Expectations at Scale. | Conventional learner analytics platforms struggle to answer students' critical question: 'What do I need to do differently?' We present a "feedback portfolio" prototype that uses LLMs to synthesize assignment feedback across all modules for individual students within a learning management system (LMS). Developed in partnership with the LMS provider for a UK university serving 40,000 students, the prototype generates both student-level and cohort-level views that support the UK academic tutor system and align with Office for Students expectations for proactive wellbeing support. Validation at an international conference indicated broader applicability across diverse educational contexts. The solution supports feedback literacy development—a demonstrated contributor to student wellbeing and a priority within the Teaching Excellence Framework. However, implementation raises questions regarding data governance, LLM reproducibility, and bias detection that require consideration before broader deployment. |
| Choon Lang Gwendoline Quek and Aman Abidi | A RAG-Based Conversational Interface for Data-Driven Teacher Reflection | This practitioner report introduces a RAG-based conversational interface for teacher reflection on classroom practice. Traditional reflection approaches, relying on memory recall or lesson review, are cognitively demanding and logistically constrained by heavy teacher workloads and technical barriers in existing learning analytics platforms. To address these limitations, our proposed system integrates automated audio transcription, Singapore Teaching Practice framework (STP) classification, and a RAG-based architecture providing holistic trend analysis and evidence-based retrieval of specific teaching moments with utterances and timestamps. A case study with an experienced secondary science teacher demonstrated the platform's ability to support independent, evidence-based reflection: The teacher identified three growth areas through data-driven insights and appreciated non-biased feedback grounded in actual classroom practice. These findings highlight conversational AI's potential to democratize data-driven teacher reflection by minimizing technical barriers and time constraints. |
| Everton Souza, Andreza Falcão, Daniel Rosa, Cleon Xavier, Newarney Torrezão da Costa, Otavio Monteiro, Tony Kato, Alessandra Debone, Luiz Rodrigues and Rafael Ferreira Mello | Pilot Implementation of an Intelligent Platform for Automated Assessment in Resource-constrained Settings | Learning Analytics (LA) has been mostly driven by digitally generating data, while most learning settings are driven by paper-based solutions. Whereas the LA community often presents cutting-edge approaches for automated assessment of digital assignments, it is yet to expand LA solutions that automate the assessment of paper-based assignments where technological infrastructure is limited. This report presents how to address that gap by introducing the pilot implementation of PInA, an intelligent platform that enables capturing paper-based answers to multiple-choice questions from a low-cost smartphone, which are automatically assessed to generate detailed dashboards on student performance. The platform was used by 4 teachers to assess 60 answers from 60 students, and the findings demonstrate it delivered high usability and pedagogical value, a perfect net promoter score, and an overall 95% accuracy in detecting and identifying paper-based answers. Thus, this report presents the community with a real-world case where LA reached resource-constrained settings thanks to PInA, calling for similar developments towards ensuring our community is not restricted to those with access to technological infrastructure. |
| Pakon Ko, Linyuan Wang and Nancy Law | The integration of learning analytics in K-12 teachers’ learning design process | Learning analytics (LA) derived from digital footprints on e-learning platforms can provide valuable insights into students’ learning behaviors and performance. However, their adoption by the teaching profession remains limited as most existing LA systems do not sufficiently support teachers’ autonomy in formulating and using LA that align with their own learning design (LD). This study is an integral part of a project to develop an integrated LD-LA platform that supports the design and deployment of LA throughout the LD and course implementation process and the enactment of LA-informed actionable feedback. Using a design-based research approach, we examined how teachers decided on LA questions (LAQs) and data sources, as well as their interpretations and subsequent actions based on the LA results. The findings support a pattern-based approach and its operationalization to the development of design-aware learning analytics systems by extending LD patterns to include relevant LAQs and associated interpretation and actions. |
| Jiri Lallimo and Amanda Sjöblom | Using a human-centered approach with a systemic perspective to build impactful learning analytics with the community | We present a higher education case for a human-centered approach to providing learning analytics (LA) to academics, students and services, to foster co-development and innovation that transcends singular LA projects or tools, resulting in a growing communal LA development within educational processes. We present examples of enabling continual development dialogue outside the major development cycles for LA tools, and how the participating academic staff are invested in the co-development. Emphasis is given to providing continual support to users, enabling them to integrate LA into their tasks and processes, rather than tools existing separately from day-to-day educational activities. We reflect on opportunities to see tools in use, how the LA affect educational processes, and how these interactions lead to further development as well as skill-sharing. We present a model combining technical development, higher education processes and actions for implementation with continuous support and development. |
| Laurenz Neumann, Maximilian Kißgen, Michal Slupczynski and Stefan Decker | Ampel-Grader: Traffic-Light Feedback for Self-Regulated Learning in Digital Exercises | Rising student numbers and limited teaching resources present challenges for grading and providing timely feedback in courses. As a solution, we present the Ampel-Grader, an automated grading tool designed to offer students traffic-light-based feedback on their exercises for self-regulated learning. Further, a Learning Analytics Dashboard was developed, providing instructors with insights into student performance. We employ a comparative analysis of student performance in a database course before and after the introduction of the Ampel-Grader, examining the impact on learning outcomes. Key findings indicate that the tool successfully acts as a low-cost, real-time feedback system, enhancing students' performance. The results suggest that the Ampel-Grader promotes self-regulated learning while reducing the workload of instructors. |
Posters
| Authors | Title | Abstract |
| Minsun Kim, Seon Gyeom Kim, Suyoun Lee, Yoosang Yoon, Junho Myung, Haneul Yoo, Jieun Han, Hyunseung Lim, Yoonsu Kim, So-Yeon Ahn, Juho Kim, Alice Oh, Hwajung Hong and Tak Yeon Lee | Student–ChatGPT Interaction Visible: Designing a Teacher Dashboard for EFL Writing Education | We present a Prompt Analytics Dashboard (PAD) for teachers that can traces student-LLM interactions from EFL writing classes. PAD can show prompt-response exchanges with LLM chatbot and English essay writing revision histories to support data-informed instruction and transparency in classes. Through two iterative co-design sessions with six EFL instructors, we distilled a compact trace taxonomy (misuse signals, goal‑alignment cues, revision effort) and instantiated three interface views (overview, week/outcome filter, drill‑down with evidence snippets). This pipeline summarizes potential misuse and alignment at class/cohort levels and attaches micro‑explanations to reduce over‑surveillance. Instructors reported reduced scanning burden and clearer timing for interventions. |
| Tongxi Liu, Luyao Ye and Hanxiang Du | From Scores to Structures: Causal Interpretability of LLM-Based Essay Evaluation | Large Language Models (LLMs) are increasingly used in automated essay evaluation (AEE), yet their scoring logic remains largely opaque. This work employs a causal discovery framework to examine how an advanced LLM, LLaMA 4 Maverick, integrates multi-level linguistic features when assessing long essays. Specifically, 60 human-rated essays were analyzed, and lexical, syntactic, structural, semantic, coherence, and readability features were extracted and modeled for their causal influence on LLM-generated scores across four writing dimensions. Results show that lexical and syntactic features exert the strongest effects, with coherence and semantic factors contributing indirectly. Cross-domain integration accounted for over 78% of detected relationships, indicating a holistic evaluation process. The findings demonstrate how causal discovery enhances the interpretability and construct validity of AI-based writing assessment and support the development of transparent, fair, and pedagogically aligned applications of LLMs in educational contexts. |
| Sutapa Dey Tithi, Bedriye Akson Eren, Meltem Alemdar, Parvez Rashid and Yang Shi | Mapping Cognitive Flows and Student Programming Patterns: A Case Study on an Introductory Python Course | Understanding how students approach programming problems is essential for designing effective feedback and instructional strategies. Cognitive Task Analysis (CTA) provides a systematic framework for analyzing students’ problem-solving processes; however, expert-generated CTAs are time and effort-intensive and difficult to scale. This study introduces an automated pipeline that leverages Large Language Models (LLMs) to map students’ code submissions to their cognitive task flowcharts (CTF). We present case studies comparing LLM-generated flowcharts of students’ code with expert problem-solving steps to evaluate the accuracy and pedagogical alignment of students’ solutions. Results indicate that LLM-based CTFs effectively capture students’ problem-solving steps. Notably, students who failed to complete assignments were often hindered by factors unrelated to the stated learning objectives, despite demonstrating cognitive steps aligned with those objectives. These findings highlight the potential of automated CTF as a scalable learning analytics approach for modeling, interpreting, and enhancing both assignment design and students’ programming performance. |
| Jeanne Bagnoud and Terry Tin-Yau Wong | Influence of Level Choice on Performance Improvement Through Multi-Channel Sequence Analysis with Mixture Hidden Markov Models | One of the critical challenges in educational apps is identifying how to provide exercises at appropriate levels to enhance learning. For primary school students, integrating how children interact with the app is central in this process. This poster explores the influence of children's level choice on improvement in a math app using Markov Models on Multi-channel sequences, with one channel encoding the level choice and the other whether the child passes the level. Improvement between pre- and post-test was used as a covariate. Two clusters were identified. Interestingly, children in one of the clusters, who showed less improvement, did only the easiest and hardest exercises, while the other focused more on the easier and moderate levels, transitioning to the hardest level mainly after mastering the previous ones. To triangulate the results, the process map for the highest improver was visually compared with that of children who showed no improvement. Here again, a difference in the pattern of level choice was observed. These results underline the importance of considering children's behaviour alongside the app's features to enhance learning. |
| Luis P. Prieto, Paula Odriozola-González, Cristina Villa-Torrano, Victor Alonso-Prieto, María Jesús Rodríguez-Triana and Yannis Dimitriadis | Actionable socio-emotional learning analytics: Towards an interpretation framework | Transforming learning analytics (LA) models and visualizations into concrete actions a learner can take to improve learning (i.e., LA "actionability") remains a challenge, especially in learner-facing analytics. Generative artificial intelligence (GenAI) technologies offer new opportunities to scaffold sense-making but may also undermine learners' long-term meta-cognitive abilities and agency, key to learning and well-being. This actionability gap is even more acute for the uncommon use of LA for non-cognitive (e.g., socio-emotional) skills. This poster proposes a conceptual framework to support sense-making and actionability of learning analytics models, drawing from evidence-based clinical psychology, actionable analytics, theories of agency, and human communication frameworks. The framework offers modeling guidance and how to frame data conversations to foster learner action. We illustrate its use and added value through an idiographic LA analysis of authentic doctoral learner data. |
| Mitsushiro Ezoe and Masanori Takagi | Learner Acceptance Structure of Educationally-Grounded AI Feedback: An Exploratory Factor Analysis | AI-powered Feedback Systems (AI-FS) are expected to narrow the "feedback gap" in education. However, existing AI-FS often lack deep integration with educational principles. This study developed and implemented an AI feedback system (PRSS-GA) based on the theoretical framework of Intelligent Tutoring Systems (ITS), integrating a Domain Model (national curriculum) and a Pedagogical Model (Hattie & Timperley, 2007). The objective was to uncover the acceptance structure of learners (N=71) who received feedback from this educationally-grounded system, using Learning Analytics (LA) as an interpretive lens. Exploratory Factor Analysis (EFA) revealed that learner perceptions are explained by two distinct yet strongly correlated (r = .759) factors: (1) Cognitive Quality & Trust (F1) and (2) Affective & Practical Acceptance (F2), accounting for 59.1% of the total variance. This result provides a novel framework for evaluating AI feedback efficacy in the "interpretation" phase of the LA cycle. We demonstrate the synergy between AI (generation) and LA (interpretation/evaluation). This dual-factor model concludes that to bridge the "uptake gap," AI feedback must satisfy both cognitive (F1) and affective (F2) dimensions of learner acceptance. |
| Ecenaz Alemdag | Differences in Students’ Processing of Peer Feedback: An Exploratory Epistemic Network Analysis of Response Messages | Although theoretical models describe how learners process feedback, empirical research using interaction data to examine peer feedback processing remains limited. This exploratory study addresses this gap by analyzing students’ messages to peer assessors and uncovering differences in their response patterns through epistemic network analysis. The sample consisted of 34 ninth-grade students, who were instructed to complete two writing tasks (story and poem writing) in an online formative peer assessment environment. A total of 112 student responses to assessors, comprising 177 segmented comments, were analyzed using a coding scheme. The results revealed that the structures of peer feedback messages differed significantly by writing task, revision status of the writing, and the self-regulation level of the students. More specifically, the story-writing task produced stronger connections among feedback-processing elements related to affective, cognitive, and metacognitive engagement. Also, revised texts were associated with stronger links between positive evaluation and appreciation of feedback. Furthermore, high self-regulators more often combined negative evaluations with elaboration requests. These findings highlight the complex and dynamic nature of peer feedback processing, which can inform the development of adaptive, AI-based peer assessment systems that foster deeper engagement with feedback. |
| Hitoshi Inoue and Koichi Yasutake | Temporally-Embedded Frequency Transition Network Analysis: A Proof-of-Concept for Capturing Non-Stationary Dynamics in Asynchronous Learning | Asynchronous online learning behavior is fundamentally non-stationary, yet most analytics employ static models. We propose Temporally-Embedded Frequency Transition Network Analysis (TE-FTNA), extending Transition Network Analysis into time-varying Markov processes. A proof-of-concept analysis (N=1 course, 1,039 active learners) demonstrates methodological feasibility, revealing deadline-dependent dynamics: +66% High→Low post-deadline fragmentation, +11% Low→Low deadline persistence. While this single-course study establishes temporal patterns, validation requires cross-course replication, dropout transition modeling, and predictive utility assessment for early warning systems. |
| Caroline Lu | Democratising Learning Analytics: A Practitioner’s Experience Using Power BI and ChatGPT | This poster shares a practitioner’s journey into learning analytics (LA) through the use of ChatGPT and Power BI. The perceived barriers to LA—complex infrastructure and the need for technical expertise—often deter educators from exploring its potential. However, the availability of free tools such as Power BI and ChatGPT has begun to democratize LA, bringing it within reach of everyday teachers. Using ChatGPT as a personal tutor, we developed a contextualised Power BI dashboard that allows us to view assessment performance at multiple levels, identify possible grading inconsistencies across teaching teams, and uncover insights that flag at-risk students. By positioning Power BI and ChatGPT as accessible entry points, this work illustrates how LA can be adopted more widely by educators, generating insights that are both relevant and immediately actionable. |
| Alina Deriyeva and Benjamin Paaßen | Evaluation of LLM-based Explanations for a Learning Analytics Dashboard | Learning Analytics Dashboards can be a powerful tool to support self-regulated learning in Digital Learning Environments and promote development of meta-cognitive skills, such as reflection. However, their effectiveness can be affected by the interpretability of the data they provide. To assist in the interpretation, we employ a large language model to generate verbal explanations to the data in the dashboard and evaluate it against a standalone dashboard and explanations provided by human teachers in an expert study with university level educators (N=12). We find that the LLM-based explanations of the skill state presented in the dashboard, as well as general recommendations on how to proceed with learning within the course are significantly more favored compared to the other conditions. This indicates that using LLMs for interpretation purposes can enhance the learning experience for learners while maintaining the pedagogical standards approved by teachers. |
| Anouschka van Leeuwen | Learning Analytics Offer a Form of Power – Do Teachers Also Experience it as Such? | Teachers in Higher Education play a central role in shaping students’ learning experiences, yet the adoption of teacher-facing learning analytics (LA) dashboards remains limited. This poster submission introduces a novel power perspective to understand teachers’ engagement with LA. From this perspective, access to student data and the insights derived from it can be seen as a form of power, in which power is defined as the capacity to influence others through control of valuable resources. While LA can empower teachers to provide adaptive and differentiated support, it simultaneously raises ethical concerns related to privacy, data use, and responsibility. We propose that teachers’ willingness to implement LA may depend on how they construe this power: as an opportunity to enhance teaching or as a responsibility carrying ethical and emotional costs. The planned study will test this proposition by examining scenarios involving different types of LA dashboards (mirroring, alerting, and advising) and assessing teachers’ subjective sense of power and construal thereof. By reframing LA adoption as a matter of perceived power, this research aims to provide a psychological lens on the persistent research-to-practice gap in educational technology. Initial results will be presented at LAK26. |
| Theo Nelissen, Esther van der Stappen, Milou van Harsel and Anique de Bruin | An in-depth account of students’ use, and lack of use, of student-facing learning analytics for self-regulated learning | Poster. Student-facing learning analytics (SFLA) offers possibilities in formulating and delivering information directly to students to support self-regulated learning (SRL). However, in authentical higher educational settings, many students either do not engage with SFLA or use them ineffectively, highlighting a gap between intended outcomes and actual behavior. Students’ individual sensemaking needs to be explored further to understand how SFLA supports SRL. This qualitative study explores how and why students engage, or fail to engage, with SFLA in authentic educational settings. Sixteen first-year students at a Dutch university of applied sciences were interviewed about their use of SFLA for SRL based on formative quiz results, in a flipped classroom course. While some students used quiz results to guide their learning, others ignored the quiz and quiz results, used alternative strategies or consulted different indicators of being on track. Preliminary findings revealed diverse mechanisms influencing whether and how students used SFLA to support SRL, including perceived relevance, SRL skills and goal orientation. These insights underscore the importance of an in-depth exploration of HE students’ individual sensemaking in using (and not using) SFLA for SRL. |
| Jamilla Lobo, Andreza Falcão, Luiz Rodrigues and Rafael Mello | An Open-Source LLM Benchmarking for Transcribing and Scoring Portuguese Handwritten Essays | The growing advancement of Large Language Models (LLMs), with more generalizable models, offers new possibilities for the Learning Analytics (LA) community. For instance, LLMs enable the capture of students' handwritten work, expanding the benefits of LA to settings with limited technological resources. While most research establishes closed-source LLMs as the standard for AES, the adoption of open-source models offers a more affordable and customizable approach. This study conducts a comparative analysis of open-source LLMs, evaluating their effectiveness in transcribing and scoring handwritten essays from high school students in Brazil. This study highlights the cost-effective capabilities of open-source technology for developing tailored educational assessment tools. |
| Max Norris, Kobi Gal and Sahan Bulathwela | Next Token Knowledge Tracing: Towards Interpretable, Language-Grounded Models of Student Learning | Understanding how students learn requires models that not only predict behavior but are also interpretable. We introduce Next Token Knowledge Tracing (NTKT) — a language-grounded approach that reframes knowledge tracing as a next-token prediction task using pretrained large language models. Unlike traditional systems that rely on categorical IDs or opaque embeddings, NTKT directly leverages question text and student histories to model learning trajectories through natural language. This grounds predictions in interpretable linguistic units, offering a pathway toward explainable student models. On the Eedi dataset, NTKT outperforms deep learning baselines (F1 = 90.2%) and generalizes well to unseen students and questions. Grounding student models in language enables richer analytics, such as linking attention to pedagogical features and interpreting model behavior through text. NTKT thus offers both strong predictive performance and a step toward transparent, pedagogically meaningful learning analytics. |
| Ecenaz Alemdag and Guher Gorgun | AI Scoring of Peer Feedback Quality Under Different Prompting Strategies: How Close Is It to Expert Scoring? | Large language models (LLMs) have been widely used to automatically score students’ open-ended responses in educational assessments. However, little attention has been given to the use of LLMs for evaluating the quality of students’ peer feedback. In this study, we examined how an LLM (GPT-4o) scored peer feedback quality under different prompting strategies and compared its scoring alignment with that of a human expert. Feedback from 110 university students on peer writing was scored using eight LLM prompts that varied in zero-shot or few-shot design, inclusion of a rubric, and use of Chain-of-Thought (CoT). Few-shot prompting, utilizing assessment criteria and example scorings, achieved the highest agreement with the expert. A linear mixed effects model also revealed a significant interaction among shot, rubric, and CoT. The CoT approach alone in the zero-shot condition produced the largest discrepancy between the LLM and expert scores, whereas adding rubric descriptions narrowed this gap. Overall, few-shot prompting with a small set of human-scored examples was found to be the best approach for automatically scoring peer feedback quality. These findings suggest that educators may leverage LLMs to evaluate students’ feedback quality by providing a few human-scored examples, allowing them to make AI-supported decisions in peer assessment. |
| Wenting Sun | We Want More from GenAI Dashboards: Human-in-the-Loop Design for Learning Analytics–AI Synergy | Learning analytics dashboards (LADs) have evolved from static visualization tools to intelligent systems supporting human–AI collaboration. Yet how GenAI can be integrated into classroom analytics while preserving interpretability, agency, and ethical data use remains underexplored. This study presents the design and evaluation of a generative feedback dashboard that incorporates both teacher and student perspectives to support collaborative problem solving (CPS) in authentic face-to-face classrooms through human-in-the-loop, GenAI-driven feedback. Real classroom discussion recordings were transcribed and transformed into analytic QA pairs using Google NotebookLM with manual curation. Semi-structured interviews with ten teachers, students, and educational technologists revealed expectations for context-rich analytics, adaptive AI roles, and co-negotiated data privacy mechanisms. It advances the synergy between GenAI and learning analytics by embedding human-in-the-loop feedback, agentic AI roles, and co-constructed data governance into a classroom-facing dashboard prototype. |
| Xiaomeng Huang, Xavier Ochoa, Shiyu Zhang and Madhumitha Gopalakrishnan | Towards Automated Measurement of Active Listening Skills in Collaborative Problem Solving Using Multimodal Learning Analytics and Large Language Models | Active listening is a critical social–emotional skill that supports effective and inclusive communication in collaborative learning. Although well-studied in dyadic communication, it remains under-explored in small-group collaborative problem solving (CPS). However, measuring active listening at scale is difficult: teachers rarely have the capacity to observe each learner, in real time, across multiple groups. Existing measurement approaches rely on self-report or manual coding, both of which are labor-intensive and prone to bias. In this study, we propose an automated pipeline for measuring students’ active listening skills in CPS using multimodal learning analytics and large language models (LLMs). Guided by a theoretical model of active listening, we (1) extract layered collaboration contexts from transcripts using LLMs, (2) automatically detect behavioral features from multimodal data (e.g., audio, video), and (3) fuse contextual and behavioral features into a transformer-based predictive model to estimate active listening skill levels along the dimensions of the theoretical model. We outline the methodology and discuss implications. |
| Kyeong Bae Kwon, Insub Shin, Su Bhin Hwang and Yun Joo Yoo | Exploring the Complementary Roles of AI and Students in Formative Peer Assessment | Formative peer assessment is effective in fostering students' critical thinking and improving learning, but many students face challenges. As AI has rapidly developed and shown its potential for application across various pedagogical context, AI is expected to address students' difficulties. This study proposes a formative peer assessment process in which students can refer to AI-driven feedback while composing their own feedback. The process aims to enhance students' engagement and feedback quality while addressing AI hallucination issues. Students’ responses, feedback revisions, AI-driven feedback, and preference were systematically collected. Cosine similarity between vector embeddings was employed to measure the alignment between student and AI feedback. Results showed that when AI feedback was valid, students who revised their feedback demonstrated higher cosine similarity and preference scores, suggesting improved feedback quality and engagement. However, when AI feedback was invalid, students still followed uncritically, indicating students’ limited ability to critically evaluate AI responses. For future research, strategies to foster students’ critical thinking in AI-assisted learning contexts are needed. |
| Yasuhisa Tamura | Mining Student–AI Dialogue Logs for Authentic Assessment: A Minimal xAPI Profile, Micro-Viva Triangulation, and Early Evidence | The rapid diffusion of generative AI challenges the validity of traditional outcome-based assessment. This paper proposes a framework for authentic educational assessment that leverages student–AI dialogue log analysis as process-oriented evidence of learning. Building on findings from Learning Analytics, process-data research, and conversational assessment, the framework integrates minimally intrusive logging, feature extraction, and oral verification to complement written products. Empirical studies indicate that dialogue behaviors—such as clarification, verification, and contextual prompting—correlate with deeper engagement and higher achievement. The paper argues that analyzing these interactions can reveal learners’ reasoning and agency, contributing to reliable and ethically grounded assessment in AI-mediated education. Beyond complementing written products and oral checks, we explicitly argue that analyzing student–AI dialogue logs enables estimating learners’ cognitive load during study time and detecting cognitive-load offloading (i.e., delegating substantive cognitive work to AI or receiving over-scaffolded support). |
| Fahima Djelil and Nadine Mandran | A Framework for Learning Analytics Data and Indicator Quality (LADIQ) | While existing quality indicators are related to education performance and functioning and influence decision making, little research has been done to assess the quality of indicators in learning analytics (LA) tools. In this work, we define the learning analytics data and indicator framework (LADIQ), which emphasizes the importance of data quality and its influence on the quality of the indicator. It outlines the different steps of data processing for producing indicators and sets out the criteria by which the quality of the data and the indicators should be assessed. It ensures that the LA indicator remains accurate, relevant to educational goals, and supported by research frameworks. Accordingly, indicators derived from quality data and research support can contribute to improving educational outcomes. |
| Hong Pan | The AI Paradox: Using Generative AI to Validate Human Learning Through Process Over Product | What if integrating AI into courses could actually improve traditional paper-and-pencil test scores? In three Math/Statistics undergraduate courses (100-, 200-, 300-level, each) at a liberal arts institution, we are discovering something unexpected in our ongoing Fall 2025 implementation: when we redesign pedagogy to emphasize learning process over product—using AI as a catalyst—students demonstrate significantly stronger mastery of core concepts on conventional assessments. Early results from weekly quizzes and midterm exams show promising improvements compared to previous cohorts. This paradox challenges the prevalent narrative that AI threatens academic integrity and reveals a path forward for institutions committed to high-touch, inquiry-based learning. This poster presents a replicable three-part framework: (1) AI-augmented courseware enabling Just-In-Time pedagogical adjustments, (2) student co-created ethical guidelines (the "AI Pact"), and (3) assessment redesign creating a feedback loop where in-class work provides honest snapshots of learning, enabling rapid AI-generated interventions. Preliminary evidence suggests this approach creates a self-improving system where assessment, AI generation, and student learning reinforce each other. Implementation challenges including quality control demands, student metacognitive readiness, and significant time investment are discussed transparently. |
| Ji Hyun Yu, Hannah Wilhoit and Fengjiao Tu | Beyond Interaction: Validating New Pathways to Success in MOOC-based Microcredentials | Traditional engagement models overvalue visible interaction, often misinterpreting quieter learners in MOOC-based microcredentials. This study addresses this gap through a two-part analysis of a professional certification MOOC-based microcredential (N = 11,424). Study 1 applied K-Means clustering using novel metrics—Attempt Rate (initiative) and Success Rate (persistence)—across consumptive, constructive, and interactive activities. Study 2 linked the resulting profiles to post-course survey outcomes (n = 1,177). The analysis identified four distinct profiles: Holistic Learners, Universally Disengaged, Strategic Consumers (efficient/selective), and Active Lurkers (independent/non-interactive). A key finding was that Active Lurkers and Strategic Consumers reported skill development comparable to the highly interactive Holistic Learners yet demonstrated significantly higher course satisfaction and stronger intentions for professional application. This provides quantitative evidence that quiet, self-regulated learning is a valid pathway to meaningful outcomes, challenging the assumption that social interaction is the primary driver of success. |
| Iman Mohammadi, Ali Keramati, Jie Cao and Yang Shi | How Teachers Use AI Tools in Everyday Teaching: An Analysis from Reddit Discussions | Artificial Intelligence (AI) is rapidly entering classrooms, yet we still know little about how teachers use it in everyday teaching. To investigate this underexplored area, this poster analyzes Reddit discussions on r/Teachers to explore how teachers use AI tools in teaching, the most common applications, and the factors that enable or hinder their use. We compiled a corpus of 4.8 million posts from r/Teachers, combining threads and comments from November 30, 2022, onward, and qualitatively coded posts where teachers discuss the use of AI tools in the education domain. Using inductive coding and thematic analysis, we identify themes describing the classroom applications of AI. We identify these applications in three broad domains: (1) planning and assessment design, (2) teacher workload and communication, and (3) student-facing support. Teachers describe using LLMs to draft lesson and assessment materials, support grading and feedback, and provide differentiated and translated content, as well as organizational scaffolds for students. We also surface key enablers (learning opportunities, policies, and access to tools) and challenges (concerns about cheating and loss of student voice, bias and accuracy, privacy, and equity and sustainability), as well as mixed strategies such as process-based monitoring and constrained AI use in assignments. |
| Suijing Yang, Joni Lämsä, Márta Sobocinski, Justin Edwards and Sanna Järvelä | Designing a learner-AI co-evolutionary approach to supporting socially shared regulation in hybrid intelligence | Artificial intelligence agents have been leveraged to support group-level regulation, which is essential to the success of collaborative learning. Despite significant advances, current research rarely addresses how learners and AI can co-evolve to better understand and support learners’ regulatory processes. We propose a novel approach that incorporates human annotation as a shared opportunity for mutual adaptation between learners and AI. In this approach, learners are invited to annotate their learning process data detected by an AI agent, contributing to the refinement of the agent’s detection of cognitive, metacognitive, behavioral, and socio-emotional trigger contexts, while simultaneously enhancing learners’ metacognitive awareness. This poster presents the design decisions and workflow underlying the proposed approach. Future work of further development areas are also discussed. |
| Rémi Saurel, Franck Silvestre, Jean-Baptiste Raclet and Emmanuel Lescure | Mind the Gap: Benchmarking AI vs. Human in Automatic Short Answer Grading | Grading short open-ended responses remains a persistent challenge in educational settings, motivating ongoing research in Automatic Short Answer Grading (ASAG). The emergence of Large Language Models (LLMs) has renewed interest in their ability to act as evaluators—“LLM-as-a-judge.” However, existing studies indicate that even advanced models may not yet surpass traditional machine-learning approaches when faced with domain-specific or unseen questions. This research explores the configuration of AI agents through the interaction of three key components: the agent, the prompt, and the model. We propose a flexible benchmarking environment, MAESTRO-bench, designed to iteratively test and adapt configurations across different languages. The project seeks to determine which combinations most closely approximate human grading patterns and how discrepancies between human and AI graders compare to inter-human variance. This poster introduces the conceptual framework, initial configurations, and evaluation workflow of MAESTRO-bench, contributing toward explainable, multilingual, and pedagogically grounded AI assessment systems. |
| Arosh Heenkenda, Jessica Mark, Chhordapitou Nguon, Bhagya Maheshi and Yi-Shan Tsai | Developing an AI Agent to Support Students to Provide Peer Feedback like Domain Experts | As artificial intelligence (AI) becomes increasingly integrated into educational contexts, its potential to enhance feedback processes warrants deeper exploration. Peer feedback plays a critical role in developing evaluative judgement, yet students often struggle with feedback quality, confidence, and the social dynamics involved. This poster presents a study designing and evaluating an AI-driven chatbot within the TOOLNAME platform to scaffold effective peer feedback. Using a human-centred, iterative design process including a student survey, co-design workshop, and two prototype validation phases, we identified user needs and refined the system. Findings indicate the chatbot improved feedback structure, reflection, and confidence while supporting learner autonomy. Participants valued scaffolded prompting and reflective dialogue, though noted a need for more concise responses and smoother interface interactions. This work contributes to the growing field of AI-enhanced learning by demonstrating how conversational agents can foster feedback literacy in peer feedback practice. It bridges the gap between automated feedback tools and learner-centred design, offering insights into how AI can support, not replace, authentic peer interactions. Future directions include adaptive scaffolding and domain-specific educational language models to further personalise support. |
| Tom Penniston | Visualizing Learning Outcome Alignment with Network Analysis | This study analyzes learning behavior data from a university-level mathematics course to examine how students engage with materials linked to Learning Outcomes (LOs) within a Learning Management System (LMS). The dataset includes 23,498 records from 176 students (February–June 2024). Network analysis visualized engagement patterns across 163 behavior types, revealing that about one-third of interactions were LO-related, with Quizzes serving as primary anchors and Home as a central hub. The resulting network highlights both intentional course design and emergent student behaviors, illustrating how analytics can expose the relationship between intended and enacted learning. Grounded in empirical data and interpreted through the Quality Matters (QM) framework (2023), this approach demonstrates how evidence-based visualization can inform iterative course design and continuous quality improvement. |
| Yuan Sun, Megumi Shimada, Aki Shibukawa, Toshiko Hosaka and Hiroko Yabe | Integrating Cognitive Diagnostic Assessment with Learning Analytics: A Data-Informed Learning Support System for Japanese Listening | This study introduces a data-informed adaptive learning system that integrates Cognitive Diagnostic Assessment (CDA) with personalized feedback for Japanese listening learning. We developed the Cognitive Diagnostic Japanese Listening Test (CD-JAT)—an online platform that links Moodle with a Python-based diagnostic API using the DINA model. The system estimates mastery probabilities for seven listening subskills and provides immediate, visual feedback with links to targeted practice tasks. Learners can review their cognitive profiles, receive attribute-specific guidance, and re-engage in follow-up activities. Although the current implementation focuses on diagnostic feedback rather than full-scale Learning Analytics, it encourages self-regulated learning and lays the groundwork for future analytics integration. The framework illustrates how psychometric precision can be transformed into actionable, learner-centered feedback that supports adaptive and transparent learning environments. |
| Dongsim Kim, Meehyun Yoon and Jiyoung Lim | Trust Boundaries in Responsible Learning Analytics: Who Gets Access to What Data? | This study investigates how university students decide who can access what learning data in the context of AI-based learning analytics. Using survey data from 179 Korean students, differences in data disclosure were analyzed across three stakeholder groups—teaching-learning stakeholders, administrative staff, and external organizations—and four data categories: personal, background, learning, and sensitive information. Repeated measures ANOVA revealed significant effects for both dimensions. Students showed the highest willingness to disclose learning data but reported relatively low trust toward teaching-learning stakeholders who directly interact with them. This indicates that while students acknowledge the educational utility of learning data, they remain cautious about its use by those with evaluative authority. The findings suggest that learners distinguish between the value of data and trust in its stakeholders. Limited understanding of how data are used and a lack of transparent communication may reinforce this ambivalence. The study provides empirical evidence to guide the design of responsible learning analytics frameworks that enhance transparency, trust, and learner agency in data governance |
| Federico Pardo García, Oscar Canovas and Felix J. Garcia Clemente | Enhancing Teacher Intervention Classification in SRS Classrooms Using Paralinguistic Features | In classrooms using Student Response Systems (SRS), providing teachers with feedback on their pedagogical interventions is crucial for improving their professional development. Artificial Intelligence systems that analyze classroom audio can provide this feedback, but they often rely only on text transcripts. We argue that, in a dynamic SRS environment, text can be complemented with additional data, as the way a teacher speaks—captured by paralinguistic cues— can provide an enriched context. This study tests this hypothesis by classifying 1.674 teacher interventions from 10 classroom audio recordings. We compared a strong text-only DeBERTa-based model achieving an F1-score of 0.709 with late-fusion hybrid models that combined text predictions with paralinguistic features. Cross-validation showed that the hybrid models achieved significantly higher performance with Random Forest and Gradient Boosting achieving F1-scores of 0.737 and 0.734 respectively. Our results demonstrate that paralinguistic features are a valuable and necessary addition for accurately classifying teacher talk, offering feedback for data-driven teacher reflection. |
| Huan Kuang, Okan Bulut, Jiawei Xiong, Anthony Botelho and Justin Kern | Seeing How Students Think: Enhancing Assessment Through Learning Analytics of Behavioral Log and Response Modeling | This study introduces an approach for combining students’ action sequences from log data with their item responses to improve measurement accuracy in digital problem-solving tasks. To achieve this goal, we first apply a Long Short-Term Memory (LSTM) model to learn representations of students’ action sequences and predict the likelihood of correct responses. These behavior-informed predictions are then incorporated into a bi-factor modeling framework that separates students’ overall performance from their response process patterns. Specifically, the general factor provides a refined estimate of problem-solving ability, while the specific factors capture behavioral differences across learners. Using data from the PISA 2012 computer-based mathematics assessment, our results show that the general factor is strongly correlated with original PISA scores, while the specific factors reveal students’ problem-solving strategies. Our study demonstrates how log data can enhance measurement accuracy and provide additional behavioral information. This approach can be particularly valuable in low-stakes learning environments, where the small number of items and limited response information can make traditional psychometric models less appropriate. |
| Elmar Junker, Silke Deschle-Prill, Andre Kajtar, Nicole Kraus, Christine Lux, Birgit Naumer, Anne Sanewski, Jannis Finn Schmidt and Ulrich Wellisch | Impact of Predictive Learning Analytics with Formative Learning Feedback on Exam Failure Rates | The poster presents the results of the project “Feedback based on Analytics of Teaching and Studying meets Individual Coaching”1 for 2024 and 2025. Using predictive learning analytics, students receive personalized insights into potential exam outcomes, fostering learning progress. In a physics course for industrial engineering students, AI models are trained on six years of historical data integrating indicators from prior knowledge and Moodle activity into explainable models with logistic regression. The feedback design, co-developed with students, combines motivational emails with peer coaching, improved learning objectives and quiz question feedback. The emails deliver the AI-generated predictions, individualized learning recommendations, benchmarks with peers, and guidance on support resources. About 95% of students engage with the service, which is conducted with consent and under a robust data protection framework. 80% of students indicate that the feedback emails are being motivating and almost half changed their learning behavior. Failure rates show a downward trend (vs. 2014–2023, p < 0.05). The poster presents the overall framework and the outcomes. |
| Blaženka Divjak, Barbi Svetec, Petra Vondra and Darko Grabar | Enhancing Program-Level Learning Design through Curriculum Analytics | The poster presents the latest developments of the (BLIND) learning design (LD) concept and tool with a focus on curriculum analytics (CA). The well-established tool has, since 2021, undergone several design cycles, responding to the needs of educators and supporting 3000 users around the world in developing sound course LDs. The latest cycle resulted in extended functionalities for program-level LD, including extensive CA available in real time. The functionalities related to program-level planning and CA have been piloted in the context of a postgraduate level study program encompassing 15 courses, which is undergoing the revision process in line with the principles of quality assurance (QA) in higher education. The poster will present an overview of the CA functionalities and the findings of the pilot, discussing the benefits in the context of QA. |
| Luiz Rodrigues, Cleon Xavier, Newarney Torrezão da Costa and Rafael Ferreira Mello | Analyzing the Usability of ChatGPT and Google Search on Student Self-efficacy and Performance in Coding Assessments | The Learning Analytics (LA) community has explored how varied support tools influence learning and teaching. While prior research has examined traditional search engines, the emergence of AI chatbots, such as ChatGPT, raises questions about their impact on learning processes, particularly in relation to students’ self-efficacy and tool usability. This study examines the relationship between ChatGPT and Google Search in terms of performance in Data Structures assessments. Fourteen undergraduate students completed two HackerRank problem sets, each supported by one of the tools, with performance, perceived usability, and self-efficacy measured. Results revealed significantly higher performance with ChatGPT than with Google, but no significant difference in perceived usability or correlations between performance, self-efficacy, and usability. Exploratory analyses suggested that for Google, higher self-efficacy and perceived usability were associated with better performance, whereas ChatGPT’s support led to near-perfect performance regardless of these factors. |
| Alina Sheppard-Bujtor, Martin Greisel, Julia Hornstein and Ingo Kollar | Does One Size Fit All? How Pre-Service-Teachers Adapt Their Peer Feedback Messages to Meet Recipients’ Skill Levels During Evidence-Informed Reasoning | This poster submission addresses peer feedback, a commonly used approach to learning, yet one from which not all learners benefit equally. We assume that differences in the extent to which feedback providers adapt their feedback messages to the feedback recipients’ skill levels explain this phenomenon. Therefore, we explore to what extent pre-service teachers construct feedback messages in an adaptive way, in essence, to what extent their feedback messages include features (for example, explanations and solutions) matching the quality of the recipient’s task performance. To this end, N = 297 pre-service teachers drafted an initial solution to a problematic classroom vignette and then provided peer feedback on each other’s performances. We coded the evidence-informed reasoning performance in these case analyses and the presence of features in the feedback messages. Analyses will pertain to level of skill in the task performance and feature constellations. Findings will have implications on the design of online tools to detect adaptivity in real time, thereby optimizing understanding and learning outcomes. |
| Kamran Mir, Geraldine Gray, Ana Schalk and Muhammad Zafar Iqbal | Exploring Learning Analytics (LA), Artificial Intelligence (AI) and User Feedback to Evaluate the Impact of WebXR in Distance Education | This study explores the use of Learning Analytics (LA), artificial intelligence (AI), and user feedback to evaluate the impact of immersive WebXR content in distance education. It examines how the integration of WebXR with generative AI agents into Moodle, a widely used virtual learning environment (VLE), can enhance student engagement, learning outcomes, and satisfaction. The research design includes two learner groups: an experimental group interacting with AI-enhanced WebXR environments, and a control group receiving the same course content of ICT subject, through conventional online learning design methods. Using LA combined with the DeLone and McLean Information Systems Success Model, the study assesses system quality, information quality, service quality, user satisfaction, intention to use, net benefit, academic engagement, and learning gains. A mixed methods approach collects log data from both the Moodle LMS and WebXR platform, alongside self-reported learner feedback, to provide a comprehensive evaluation framework. The findings aim to offer empirical evidence on the effectiveness of immersive technologies enhanced by LA and AI, providing actionable insights and practical recommendations for integrating WebXR tools into existing LMS platforms to foster innovative, engaging, and inclusive distance learning experiences. |
| Armin Martin and Sarah Bichler | AI Agent Maths Tutors in Elementary Aged Student Learning | This study investigates the challenges and impact of AI tutor agents in contrast to more traditional adaptive technologies for differentiated instruction. The recent introduction of AI tutors in elementary school contexts with AI agents extends the adaptive teaching and learning opportunities to be more personalized and interactive. In this learning pilot, several different adaptive learning technologies are currently being trialed for personalized mathematics learning in nine classrooms with 180 students in third and fourth grade. One of the learning technology platforms is an AI tutor agent and is being compared against more traditional adaptive technology platforms. This four month trial is based on scores from a norm-referenced, growth-reference and criterion-based assessment which will be repeated after the trial to assess student progress on domain learning. In addition, we will report on effects of the trial implementation on student engagement and teacher ease of implementation. |
| Niels Seidel and Slavisa Radovic | Measuring and Promoting Self-Regulation with Agentic AI and Learning Analytics | Traditional Self-Regulated Learning (SRL) measurement methods, such as surveys and digital trace analyses, face issues of interpretability, restricting the development of effective interventions. Interviews (e.g. the Self-Regulated Learning Structured Interview protocol), another type of instruments, while demonstrated validity and reliability has not been used widely due to its labor-intensive and time-consuming nature on the side of instructors and researchers. To address these challenges, this study presents AI-SRLSI, a generative AI-based chat agent designed to conduct Zimmerman and Martinez-Pons’s Self-Regulated Learning Structured Interview (SRLSI). The multi agent system leverages natural language understanding to elicit learners’ reflections on goal setting, strategy use, monitoring, and self-evaluation while automating SRL assessment. In pilot testing, AI-SRLSI demonstrated strong performance in conducting SRL interviews, showing high conversational coherence and assessment efficiency. Participants found the tool intuitive and accurate, though feedback on the usefulness of its personalized recommendations was mixed. These results indicate the feasibility of using generative AI for SRL measurement and highlight its potential to enhance scalable, data-informed, and personalized learning support in future educational applications. |
| Yeyu Wang, Kapotaksha Das and Vitaliy Popov | Co-Attentive Epistemic Network Analysis | Co-Attentive Epistemic Network Analysis (CAENA) models how semantic meaning connections emerge through interaction in collaborative discourse. Building on Epistemic Network Analysis (ENA), we propose CAENA which redefines connections between coded utterances based on semantic relatedness and temporal proximity. Rather than using a fixed sliding window, it learns adaptive context ranges from cosine similarity across utterance embeddings. Conceptually, CAENA retains ENA’s interpretive strength in modeling individuals-in-groups while aligning temporality with semantic coherence. Methodologically, it replaces the fixed-length window with a semantic-decay function that adapts to discourse dynamics, increasing model validity. Practically, it removes manual parameter tuning and produces more parsimonious, interpretable networks. Preliminary findings show that CAENA preserves critical distinctions between expert and non-expert teams in clinical simulations while offering a more compact, semantically meaningful representation of connection-making patterns. |
| Huda Alrashidi | Learning Analytics of Reflective Writing in STEM: Machine Learning and Large Language Models for Adaptive Feedback | Reflective writing is increasingly used in STEM education to help students monitor their learning, yet providing timely, consistent feedback at scale remains difficult. Building on a Computer Science–specific Reflective Writing Framework (RWF), this study combines learning analytics, feature-based machine learning, and large language models (LLMs) to estimate reflection depth and generate personalised feedback. A corpus of sentence-level excerpts from Computer Science reflective assignments was manually coded for seven breadth indicators and three depth levels, with fair-to-substantial inter-rater reliability. We trained models for indicator detection and depth classification, comparing a direct depth classifier with a two-stage pipeline that first predicts indicators and then uses them as inputs for depth. The two-stage approach achieved higher accuracy and Cohen’s κ, particularly for future action and new learning. These analytics then condition an LLM (ChatGPT), which receives each sentence plus the predicted indicators and depth label and produces short, criterion-referenced feedback messages. The proposed RWF–LA–LLM model illustrates how conceptually grounded analytics and LLM-based feedback can be integrated to support scalable, personalised, and adaptive feedback on reflective writing in STEM courses. |
| Claudia Ruhland | Transformation of the Consent Process: From Information Delivery to Chatbot Communication Along the AIDA Model | Digital consent requests for the processing of personal data for Learning Analytics are often overlooked or routinely confirmed without any actual consideration of data protection issues. This contradicts the GDPR’s requirement for an informed, voluntary decision. This poster presents how we used AIDA-based information design (Attention–Interest–Desire–Action) to transform consent processes from one-sidedly informative to dialogue-promoting. In four independent study phases, we tested each AIDA component with separate groups of students. The results show that AIDA elements can generate attention, promote interest, trigger reflection, and lead to conscious decisions to act. The key finding is that the AIDA model can transform the classic consent declaration into a communicative decision-making process. |
| Jae-Eun Russell, Anna-Marie Smith, Jonah Pratt, Rhiannon Davids, Salim George, Adam Brummett and Renee Cole | AI Tutor Student Interactions: Problem Context Matters | This study investigates student–AI tutor interactions in an introductory chemistry course, focusing on how varying levels of cognitive engagement in dialogues relate to learning outcomes. A sample of student-AI conversations were coded by subject-matter experts and used to fine-tune a BERT-based machine learning model to classify the full dataset. Results showed that basic replies dominated student responses, while process- and reasoning-oriented messages were relatively rare. However, process- and reasoning-oriented responses appeared more frequently during computation problems than multi-select problems. Cluster analysis revealed four distinct student groups based on their AI interaction patterns. Although the Overreliant group had, on average, the highest homework scores and the lowest exam scores, regression analyses revealed no significant differences in exam performance between groups after controlling for incoming GPA. These findings highlight the importance of problem context in shaping the types of interactions students may have with LLM-based AI tutors. |
| Haochen Song, Ilya Musabirov, Ananya Bhattacharjee, Audrey Durand, Meredith Franklin, Anna Rafferty and Joseph Jay Williams | Adaptive Experimentation for Learnersourcing Analytics: A Weighted Allocation Probability Thompson Sampling (WAPTS) Approach | In this poster, we introduce Weighted Allocation Probability Thompson Sampling (WAPTS), a response-adaptive policy designed to improve evidence-based decision-making under these data-sparse conditions. Within the learning analytics cycle, WAPTS functions as a mechanism for balancing exploration and exploitation, enabling faster and more interpretable insight into learner engagement and content effectiveness. Through simulation studies across multiple effect sizes, we show that WAPTS identifies near-optimal learning materials earlier and more reliably than Thompson Sampling allocation. We showcase its potential in a simulated learnersourcing scenario where students evaluate peer-generated examples, demonstrating how adaptive experimentation can be integrated into classroom practice. This poster highlights how WAPTS can enhance the analysis and decision phases of learning analytics, enabling better personalization under resource constraints. |
| Daniel Jerez, Elisabetta Mazzullo and Okan Bulut | Using Sequence Analysis to Explore Slow Responses in Complex Problem-Solving | Sequential process data from interactive, digital problem-solving tasks can be used to extract meaningful insights about cognitive processes underlying problem-solving and test-taking behaviors (e.g., disengagement). Sequential data mining and clustering have emerged as promising techniques in learning analytics for identifying distinct behavioral patterns in sequential data. In this study, we applied sequential data mining to investigate whether an underexplored subset of students (i.e. those exhibiting prolonged response times during problem-solving) could be classified into distinct, homogeneous subgroups. We applied full-path sequence analysis and hierarchical clustering to process data from the PISA 2012 problem-solving assessment, introducing a novel approach that incorporates pause information into sequential analyses. Our results reveal two distinct, homogeneous subgroups of prolonged responses within a complex problem-solving task. As these groups differ meaningfully in their test-taking behaviors, examining their interaction patterns can provide further insights into the cognitive processes that may lead to extended response times. This work demonstrates how sequence analysis and clustering can generate actionable insights from process data and contributes to the learning analytics community by advancing methodological approaches for studying slow, complex problem-solving behaviors. |
| Vitaliy Popov, Steve Nguyen and Xavier Ochoa | Toward Automated Feedback Generation: A Scalable Multimodal AI Approach to Clinical Simulations | Recent advances in multimodal learning analytics (MmLA) offer new opportunities to automate analysis of educational audiovisual recordings. This Poster presents the first MmLA pipeline for naturalistic standardized patient (SP) simulations in medical education, focusing on “breaking bad news” scenarios. Our modular pipeline integrates state-of-the-art speech recognition, gaze tracking, and large language models to extract behavioral, verbal, and semantic features from 122 naturalistic simulation sessions involving 244 medical and social work students. Technical evaluation shows effective feature extraction and fusion (including diarized transcripts; visual attention and gaze analysis; LLM-based assessment). Processing efficiency, low configuration complexity (<1 minute per ~15 min session), and modular architecture enable adaptation to diverse team-based educational settings. |
| Libor Juhaňák, Monika Řičicová and Veronika Štolfová | Who Graded Me? Exploring Students' Perceptions of AI vs. Human Assessment | This poster presents the findings of an experimental study which explored how university students perceived feedback generated by artificial intelligence (AI) compared to feedback provided by human evaluators. Conducted within a bachelor’s course on academic writing in Educational Sciences, the study examines students' perceptions of the usefulness of feedback and their ability to distinguish between AI- and human-generated assessments. Throughout the course, students submit weekly writing tasks that are randomly evaluated by either AI or human evaluators, unaware of the source of their feedback. Alongside these assignments, students are asked to reflect on their assumptions about the source of the feedback, and how useful they perceive the feedback to be. The preliminary analysis aimed to clarify whether students could tell whether the feedback they received was generated by AI or humans, and whether the perceived usefulness differed by source. Using mixed-effects models on repeated ratings, students demonstrated an above-chance ability to discriminate between AI- and human-generated feedback that was greater than chance, selecting higher 'AI' categories when the true source of the feedback was AI. In contrast, perceived usefulness did not differ between AI- and human-generated feedback. |
| Daire O Broin, Libor Zachoval, Djuddah Leijen and Ken Power | Why Don’t Students Use Self-Regulated Learning Strategies? Designing Analytics-Driven Support for Goal Setting and Monitoring | Self-regulated learning (SRL) describes how learners plan, monitor, and reflect on their own learning. Although SRL strategies are strongly linked to academic performance, many students rarely use them, citing barriers such as lack of time, low self-efficacy, and perceived effort. This work-in-progress explores how learning analytics can help learners identify and overcome these barriers. Using an action research framework, a digital tool was designed and piloted with first-year university students in a software engineering module (N = 36). The tool supported goal setting and self-monitoring of two key skills. Although 67% of participants achieved mastery thresholds, engagement was a significant issue. Next steps include qualitative analysis of barriers and the design of approaches to overcome them including training that motivates the methods and ongoing guidance. |
| Peizhen Li, Matthew L Bernacki and Robert Pulmley | When Students Start Studying: Tracing Self-Regulated Learning and Exam Performance in Undergraduate Biology | Self-regulated learning (SRL) theorists such as Zimmerman (1990) and Winne (1995) emphasize task definition and planning as crucial initial steps in the SRL cycle. These stages enable learners to identify what is expected of them, set goals, and allocate time and effort strategically to optimize performance. Within this framework, exam study guides function as essential resources that describe task demands, affordances, and constraints. They contain explicit learning objectives that operate as goals, guiding students to plan how and when to engage with exam preparation materials such as blueprints, practice tests, or lecture outlines. In this poster, we conceptualize exam preparation in high-structure active learning STEM courses as a self-regulated learning task, in which learners’ timing and frequency of study behaviors represent dimensions of planning and engagement. We investigated undergraduate students’ exam preparation behaviors in an introductory biology course to understand and to predict how different levels of planning relate to academic achievement. Students delayed onset or avoided preparation (vs. timely or advanced planning). Increasingly advanced planners engaged in more and more diverse strategic learning behaviors and performed better on exams than those who planned later or not at all. |
| Rosalyn Shin, Fengfeng Ke, Nuodi Zhang and Xiaoxue Zhou | Capabilities and Limitations of LLM as a Human Collaborator in Computational Thinking Behavior Analysis and Labeling | This study investigates two prompt designs for integrating AI in learning analytics to analyze computational thinking (CT) behaviors in adolescents with autism. Using multimodal data, the large language model (LLM) with a general instruction demonstrated limited overall accuracy (31.95%), whereas the LLM with rule-based instruction and construct definitions achieved an overall accuracy of 44.47% in replicating human-coded labels. Both LLM-generated annotations struggled with abstract constructs like Abstraction and Persistence. In this preliminary result, an LLM demonstrated the potential to expand the horizon in assisting human annotators in behavior analysis. We explored different approaches in using LLM in collaboration between AI and human contributors. |
| Heder Silva Santos, Luiz Rodrigues, Newarney Torrezão Costa, Rafael Ferreira Mello and Cleon Pereira Junior | Context Engineering in Automatic Short Answer Assessment: A Preliminary Study with LLM | This work investigates the impact of context engineering on large language models (LLMs) applied to Automatic Short Answer Grading (ASAG) in Portuguese. Two experiments were conducted using the GPT-4o-mini model and a dataset composed of Biology questions. In the first experiment, an agent generated a theoretical context for each question prior to grading. In the second experiment, the model performed an evaluation after retrieving additional context. Performance was measured using Mean Absolute Error and Root Mean Squared Error. The results showed that direct evaluation, without any contextualization, achieved the best performance, even surpassing average values reported in the literature. On the other hand, the context-based strategies presented greater variation and instability, with only 11.2% of revisions being successful. |
| Fanjie Li, Madison Lee Mason, Daniel T. Levin and Alyssa Friend Wise | When AI Questions, Humans CLARIFY: Using RE-LLM Coding Uncertainty to Resolve Codebook Ambiguities | Reasoning-Enhanced Large Language Models (RE-LLMs) enable new forms of human-AI collaboration in coding of student text that allows us to code better, not just faster. This paper introduces CLARIFY, a toolset and workflow to systematically capture, quantify, and interpret LLM uncertainty during coding as a diagnostic resource for identifying and resolving codebook ambiguities. CLARIFY employs: (1) guided chain-of-thought for uncertainty flagging, (2) RE-LLM ensemble for committee-based uncertainty estimation, (3) consensus entropy calculation to prioritize high-uncertainty cases for expert review, and (4) a formalized process for iterative codebook and ground truth refinement. Initial application of CLARIFY is demonstrated for analysis of nursing students' written reflections. |
| Zhiqing Zheng, Marek Hatala and Reyhaneh Ahmadi Nokabadi | Triangulating Immediate and Recalled Emotions Across Goal Orientations in Peer-Comparison Dashboards | As blended learning expands through digital platforms like Canvas, Learning Analytics Dashboards (LADs) provide students with feedback by visualizing performance data. These dashboards aim to enhance reflection, motivation, and self-regulation, yet their actual motivational effects remain unclear despite widespread use. Some adapt peer-comparison designs which can possibly both motivate and discourage students, with emotions playing a crucial role in shaping engagement, motivation and learning outcomes. The impact may vary by framing and individual traits, such as goal orientation. Building on these findings, this study experimentally examines how two dashboard framings, class-performance and individual-comparison, affect students' achievement emotions immediately and retrospectively across different goal orientations, controlling for competence, performance, and motivation prior to the learning activity involved in two programming courses over two assignment cycles. Results show that, among mastery-oriented learners, the Individual-Comparison Dashboard fostered more stable and enduring positive emotions, especially for lower-performing students, while the Class-Performance Dashboard amplified emotional extremes, highlighting the importance of framing in designing emotionally supportive feedback systems. |
| Ryota Tachi, Taiga Yamamoto, Tsubasa Hirakawa, Takayoshi Yamashita and Hironobu Fujiyoshi | Capturing Rare Learning Behaviors: Weighted E2Vec based on BM25 for At-Risk Student Prediction | With the digitalization of education, research on grade prediction using learning behavior logs has become increasingly active. However, conventional approaches relying on operation-count histograms fail to capture rare yet critical behavioral patterns that distinguish high-achieving students. To address this limitation, we introduce BM25-based features that account for both the frequency and rarity of learning actions, and apply effect-size weighting to emphasize behavioral differences between student performance groups. Experiments show that the proposed method improves grade prediction accuracy, confirming the effectiveness of combining BM25 and effect-size weighting for learning behavior analysis. |
| Jungchan Lee, Minsun Cho, Jin-ju Pyo and Younghun Shin | Shared ChatGPT Dialogues as Cognitive Artifacts for Enhancing Human–Human Collaboration | In collaborative problem-solving contexts, effective teamwork requires that members develop a shared understanding of the task and of each other’s thinking. This poster presents a learning analytics approach for supporting and analyzing human–human collaborative problem-solving by leveraging the outputs of human–AI collaboration. The proposed platform enables teammates to share, review, summarize, and annotate one another’s ChatGPT dialogues, transforming them into cognitive artifacts that surface individual reasoning, reveal divergent perspectives, and trigger negotiation of meaning. The approach is grounded in shared mental models (SMM) and socially shared regulation of learning (SSRL) theories, as well as emerging insights into human-AI teaming. We outline a learning analytics method for assessing semantic alignment between team members using embedding-based similarity measures and visualizing convergence over time. An experimental design compares teams using the platform with teams using standard tools, measuring SMM convergence, task performance, reflection richness and assessing SSRL episodes. This research aligns with the theme of the conference by integrating human–AI dialogue with learning analytics to foster deeper human–human collaboration and advance feedback-driven collective learning processes. |
| Sean Jackson and Qiao Jin | Where Students Do Well (and Don’t) with AI-Supported Tutoring: A Session-Based Analysis of Contexts and Engagement | K‑12 students use intelligent tutoring systems (ITS) in various contexts (during class, at school independently, or at home), but it remains unclear how context influences moment-to-moment engagement. We analyzed ~1.6 million logged student actions in MathPalace (a mathematics ITS for grades 6–12) to compare the proportion of time students spent in four engagement states: Idle, Struggle, Doing Well, and System Misuse. We first inferred session boundaries from inactivity gaps and used simple heuristics (based on time and peer concurrency) to classify context. We then modeled each session’s engagement-state proportions using mixed-effects beta regression, controlling for student and class clustering. Our results show that context has a clear impact on engagement. Classwork sessions were associated with more Idle and Struggle than both homework and in-school independent sessions, and less Doing Well than homework; this suggests lower productive engagement during class. Independent sessions, while showing less Struggle than homework, had more Idle time and less Doing Well time, indicating a more passive learning pattern. Differences in system misuse were minimal. These findings underscore the importance of interpreting engagement with contextual nuance and suggest opportunities for context-aware learning analytics tools to help teachers better support students with ITSs. |
| Naiquan Li, Dancheng Liu, Yuting Hu, Chenhui Xu, Shenghao Chao, Zhaohui Li and Jinjun Xiong | LLM4Art: Using Large Language Models for Art Interpretation | This study introduces LLM4Art, a framework that leverages multimodal large language models (LLMs) to interpret visual artworks and support public art education. With recent advances in LLMs, we explore how the LLMs can analyze visual art to enhance public art understanding and educational engagement. We propose a system that allows users to upload an artwork and receive interpretations along three expert-validated dimensions: Visual Analysis, Historical Context, and Cultural Significance. We co-designed prompts and rubrics with art scholars to capture interpretive depth beyond surface-level description. To evaluate model performance, we tested five mainstream multimodal models from the GPT and Gemini families on a curated dataset of 50 artworks. Model outputs were rated by GPT-5, used as an automated judge due to its superior performance, and tested by human agreement. Results show that Gemini models have strong interpretive accuracy, particularly in analyzing well-known artworks. Future work will focus on expanding the dataset, incorporating comprehensive human evaluations, and developing more reliable and educationally aligned LLMs for art understanding. |
| Zheyu Tan, Weiwei Jiang, Xin He and Shinobu Hasegawa | Aligning Reliable Learning Support for Self-Regulated Learning with Retrieval-Augmented Generation | Self-Regulated Learning (SRL) requires learners to actively plan, monitor, and reflect on their progress, usually supported by tools that provide accurate, transparent, and context-aware feedback. While Large Language Models (LLMs) offer unprecedented access to explanations and feedback, their responses can be unreliable due to misaligned knowledge, spurious reasoning, or shifting context. To address this issue, we propose the SRL-RAG System, which integrates Retrieval-Augmented Generation (RAG) with SRL processes to provide evidence-based AI-driven learning support. The system enhances content reliability through grounded responses and process reliability through trace-based analytics. Guided by established models of SRL, the system connects learners’ planning, monitoring, and reflection into an accountable and interpretable feedback loop. A preliminary experiment with 34 learners demonstrated complete trace capture and coherent SRL patterns. This work aims to contribute a practical infrastructure and a reliability-oriented framework that align generative AI with learning analytics for trustworthy, adaptive learning support. |
| Hanyu Su and Shihui Feng | A Pedagogy-Informed Science Chatbot to Support Secondary Students’ Cognitive Process | The emergence of Generative Artificial Intelligence (GAI) has transformed students’ access to information, influencing their cognitive development. Understanding secondary school students’ cognitive engagement in human-AI collaboration is essential for effectively using AI to support their thinking and learning. However, empirical research on this topic remains limited. Informed by Socratic questioning and self-regulated learning theory, in this study, we designed a novel custom GAI chatbot for the secondary school science learning guiding students’ problem-solving step by step. We conducted a within-subjects experiment with 48 students to examine the effect of the custom GAI chatbot compared to the standard off-the-shelf GAI chatbot on students’ science learning processes. The preliminary results indicated that the custom GAI chatbot promoted more frequent cognitive engagement and supported progressively deeper thinking, compared to the general-purpose GAI chatbot. This study contributes new empirical insights into the effects of GAI on secondary school students’ science learning processes and promotes the need for and importance of having custom GAI for scaffolding students’ learning processes. Our future work will validate and extend these promising process-level findings by examining their connection to learning outcomes with a larger sample. |
| Mladen Rakovic, Maksim Rakovic, Rui Guan, Gloria Milena Fernandez-Nieto, Tongguang Li, Xinyu Li and Dragan Gasevic | Analytics of Prior Knowledge Activation in Multi-Text Writing | Many students struggle to comprehend and effectively use reading content in text-based writing tasks. Previous research has indicated that activating prior knowledge supports deeper processing and meaning-making, which in turn benefits performance in text-based writing. In this pilot study, we examined secondary students’ prior knowledge activation during a multi-text essay writing task on artificial intelligence and medicine. We analyzed 27 students’ interactions with the reading content within the digital learning environment, comparing their pretest responses with the content of their highlights, tags, notes, and essay drafts. Students correctly answered 54.81% of pretest questions and activated their prior knowledge in 31.98% of those cases, mainly through highlighting, tagging, and incorporating relevant information into essays. Correlational analyses indicated that greater use of knowledge-activating highlights and tags was positively associated with essay performance, whereas activation expressed in essays themselves was not. The study demonstrates the potential of trace-based analytics methods to unobtrusively capture cognitive processes such as prior knowledge activation that are difficult to observe directly. Theoretically, integrating trace data with models of knowledge activation can deepen understanding of learning during writing, while practically informing scaffolds that support more strategic engagement with prior knowledge. |
| Zeynab Mohseni, Juan I. Asensio-Pérez, Miguel L. Bote-Lorenzo, Eduardo Gómez-Sánchez and Yannis Dimitriadis | Aligning Learning Design and Learning Analytics in Smart Learning Environments through Pedagogically Guided Large Language Models | Smart Learning Environments (SLEs) aim to provide personalized, context-aware learning across hybrid settings. However, a persistent limitation remains: Learning Analytics (LA) often fail to reflect educators’ pedagogical intentions as represented in Learning Design (LD). This misalignment restricts the interpretability and educational value of analytics and weakens evidence-based improvement. This work proposes a pedagogically guided integration of Large Language Models (LLMs) into SLEs, using Knowledge Components (KCs) and the Structure of Observed Learning Outcome (SOLO) taxonomy as structuring principles. KCs define assessable skills or concepts, while SOLO specifies reasoning depth, together enabling LLMs to interpret unstructured learner inputs in alignment with pedagogical goals reflected in the learning design. This poster paper illustrates the approach for the pedagogical alignment of LLMs and provides preliminary empirical evidence from an authentic hybrid learning scenario. |
| Zhichun Liu, Dan He, Jionghao Lin, Xianghui Meng and Shiyao Wei | Coding with Human or Machine? The Meaning Construction in Qualitative Coding | This study provides a comparative analysis of human interpretivist and LLM positivist approaches to qualitative coding. By analyzing a human inter-rater deliberation meeting and a set of machine-coded outputs, we contrast their methods. Human deliberation proved to be a dynamic, consensus-driven process of meaning-making, where explanations served to build and refine the codebook itself. Conversely, the machine acted as applying a static codebook with quantitative confidence. The machine confidently resolved the deep ambiguities that the human team had identified as conceptual problems in coding. This highlights their complementary roles of humans and machines and calls for future research into hybrid models for LLM in qualitative coding. |
| Pengfei Li, Qinyi Liu and Mohammad Khalil | Training without Data Sharing: Towards Privacy-Preserving Synthetic Student Data for Cross-Institutional Learning Analytics | Large-scale, cross-institutional studies in learning analytics are often hindered by data governance and privacy constraints. We present a work-in-progress vision: use privacy-preserving, collaboratively learned synthetic tabular data to enable reproducible analytics without exchanging raw records. This poster outlines the problem framing, a high-level approach, early feasibility signals, and the study plan. Specifically, we propose a cross-institutional workflow for synthetic data algorithms used in learning analytics that supports exploratory analysis, low-risk data sharing, and methodological benchmarking without requiring real-time access to data. We will use a privacy-preserving deep learning algorithm as a generator and conduct research in a federated learning scenario. |
| Boxuan Ma | Designing a Meta-Reflective Dashboard for Understanding Students’ Use of AI | This work proposes a meta-reflective dashboard that makes individual student–AI dialogues interpretable without exposing full chat logs. After each help-seeking session, a reflection agent generates a structured summary of the interaction, AI usage patterns, and potential risks, along with suggestions for instructors and students. By including the AI’s reflection, the dashboard provides a privacy-aware, low-overhead lens on AI-supported learning processes and lays the groundwork for future class-level analytics and interventions. |
| Jeongwon Lee | Exploring Learners’ Engagement Experiences in Generative AI-Assisted Collaborative Learning Activities | The rapid advancement of GenAI tools, such as ChatGPT, has generated growing interest in their potential to enhance students’ engagement in collaborative learning. However, concerns remain that such tools may also lead to disengagement. This ongoing study explores university students’ multifaceted engagement experiences in AI-assisted collaborative problem solving. Students’ reflective essays were analyzed using sentiment analysis and epistemic network analysis (ENA). Results indicated that students’ overall sentiment scores were slightly skewed toward the positive side, suggesting generally favorable perceptions of their learning experiences. ENA revealed that the more positive than average perception group demonstrated stronger co-occurrences among behavioral, emotional, and cognitive engagement codes, while the more negative than average perception group exhibited denser connections between cognitive and emotional disengagement codes. These findings suggest that GenAI influences not only the degree but also the patterns of engagement perceptions. |
| Matthias Ehlenz, Adrian Dammers and Birte Heinemann | Learning Analytics for Immersive Educational 360° Environments | Immersive 360° environments are increasingly integrated into educational contexts, yet systematic learning analytics approaches for such settings remain rare. This work presents a modular and privacy-aware framework for integrating learning analytics into an open-source platform for creating interactive 360° learning environments. The system introduces role-based dashboards for teachers, learners, and researchers, alongside innovative visualization and export mechanisms. It extends analytics with Saliency Maps that visualize visual attention within 360° scenes and with statistical indicators capturing navigation patterns, engagement, and dwell times. The resulting hybrid architecture combines xAPI-based tracking with SQL-based progress metrics to balance flexibility and computational efficiency. By addressing data privacy, visual attention, and user-specific feedback, the work contributes to a responsible and research-driven integration of learning analytics into immersive media. |
| Almed Hamzah, Sergey Sosnovsky, Hari Setiaji and Feri Wijayanto | Increasing Student Engagement in a Large Blended Course | Students in large university courses often require additional support to stay engaged with course material and self-regulate their learning activities. In a blended learning scenario, such support can be fueled by data coming from both components of the course. This paper presents a study investigating the effect of multiple nudges integrating responses during in-class and online assessment on students’ engagement. The obtained results show a significant overall effect and shed some light on individual impact of nudges of different types. |
| Jingyi Zhao, Deborah Pelacani Cruz, Matthew D Piggot and Rhodri B Nelson | Automating Curriculum Pathway Planning through LLM-Derived Course-Skill Knowledge Graphs | University students must navigate complex curricula while managing workload and career-related uncertainty, yet providing personalised academic guidance at scale remains a persistent challenge for educational leaders. To address this, the poster presents an automated curriculum pathway planning system built upon a course–skill knowledge graph constructed for an undergraduate Earth Science and Engineering programme. Using an institutional module catalogue and a controlled skill lexicon, a large language model (LLM) extracts prerequisite modules and associated skills to construct a rule-validated course–skill knowledge graph. This graph underpins a recommendation agent, accessed through an interactive planner, that integrates structured student profiles with free-text goals to generate personalised pathways that are year-appropriate, catalogue-verified, and pedagogically coherent. Visualisations of course progression and skill development provide students with a clear overview of their learning trajectory while offering educational leaders a scalable, data-driven overview of curriculum coherence. The prototype is illustrated through persona-based simulated student agents, and next steps to include user studies and broader institutional deployment are discussed. |
| Gabriel Baruta, Adolfo Mendonça and Luiz Rodrigues | Responsible Use of Generative AI in Computing Education: Design and Preliminary Results from a Practical Course | The fast integration of Generative Artificial Intelligence (GAI) in higher education intensifies the debate about the synergy between GAI and Learning Analytics, while introducing challenges like the risk of plagiarism and critical thinking erosion. Although the literature discusses theoretical and conceptual guidelines, practical pedagogical interventions are missing, especially in technical fields like Computing Education. To start addressing this gap, this poster describes a short-term course on the responsible use of GAI for coding, currently under execution with computing-related students from a Brazil university. The intervention, backgrounded in GAI usage guidelines and self-regulated learning, combines in-person lessons with an interactive script, teaching participants topics like Prompt Engineering, Critical Analysis, and Content Validation. As preliminary results, observational data suggest students’ evolution from passive consumers to critical analysts who actively validate the AI's answers, whereas student feedback demonstrates they are successfully grasping the lessons’ learning objectives. These preliminary findings suggest our intervention holds promising potential to mitigate documented risks of GAI, such as metacognitive laziness and overconfidence, despite we content that further data collection and replication is prominent to validate our intervention and inform the development of similar approaches. |
| Linda Corrin and Aneesha Bakharia | Everyday Learning Analytics: Enabling evidence-informed approaches for enhancing learning experiences and outcomes | The Everyday Learning Analytics project is a work-in-progress, equity-focused initiative that is designed to provide open access to resources to support educators to engage with learning analytics within their teaching practice. Currently, it is common for educational institutions to invest heavily in large-scale data analytics systems to improve student success. However, the generic nature of these systems can often obscure direct connection with indicators of learning and/or learning activity design. At the same time, some educational institutions do not have the financial and/or technical resources to support learning analytics initiatives at an institutional scale, resulting in inequity of access to the use of data to inform enhancements to educational practice and student support. The Everyday Learning Analytics project empowers educators to connect their learning design with data from their own systems and classrooms using the technologies they have available to hand, including emerging AI tools. An outcome of the project will be an evidence-informed, shared repository of learning analytics case studies and analysis approaches, enabling educators, especially those with limited access to specialised analytics tools, to use data to enhance educational outcomes and apply analytics in diverse learning environments. |
| Birte Heinemann, Jasmin Hartanto and Matthias Ehlenz | Bridging Software Visualization and Human-Centered Learning Analytics | Version control systems like GitLab are widely used in programming courses. They provide rich data, e.g. about collaborative learning, but often remain underexplored for learning analytics. This poster investigates how data from Git repositories can be used to support learning analytics in programming education. Building on prior work in software visualization and xAPI-based data collection, we investigate how insights from professional software development can inform the design of dashboards for educators and learners. These foundation highlight gaps in existing analytics approaches and point toward new educational perspectives. To address these gaps, we will present first prototypical visualizations on the poster, combining GitLab and learning management system data through xAPI, enabling cross-tool analyses of programming and reflection activities in the future. |
| Junbo Koh, Cheolil Lim, Woohyeok Kang and Yohan Jo | Agent for Agency: Developing an AI Agent to Support Student Agency in Generative AI–Supported Learning Environments | This study examines how large language model (LLM)-based agents can enhance student agency in AI-supported learning. Grounded in social cognitive and self-regulation theories, we developed and tested an LLM-powered agent that scaffolds planning, monitoring, and reflection during data analysis projects. Using a design-based research approach, 17 high school students in South Korea interacted with the prototype, which provided metacognitive support across six dimensions of agency. Findings showed high motivation and self-efficacy but limited planning and prompt literacy. Students appreciated non-directive guidance yet desired clearer structure and knowledge support. The second design cycle will introduce explicit learning phases, adaptive scaffolding, and prompt generation to balance autonomy with guidance. Future work aims to position generative AI as a social partner that can cultivates, rather than replaces, self-regulated learning and offers design insights for agency-oriented AI systems. |
Demos
| Authors | Title | Abstract |
| Fang Wang | ChatFang DesignEdu: A Teacher-in-the-loop Co-assessment System for Formative Feedback in Design-based Learning | This study presents ChatFang DesignEdu, an AI-assisted co-assessment prototype designed to support teacher-in-the-loop formative feedback in secondary design-based classrooms. Built with Streamlit and OpenAI API, the system enables teachers to select assessment criteria, define feedback depth (evidence-, policy-, or tactic-oriented), and generate adaptive AI dialogue aligned with classroom pacing. Using Design-Based Research (DBR; Barab & Squire, 2004), this study investigates how such workflows operate at classroom speed, iteratively refining system rules with teachers to ensure pedagogical alignment and practical usability. The rubric and detectors are grounded in Assessment for Learning (AfL), Self-Regulated Learning (SRL; Zimmerman, 2008), and the Knowledge-Learning-Instruction (KLI; Koedinger, Corbett, & Perfetti, 2012) frameworks, and guided by human-AI complementarity principles (Holstein & Aleven, 2020). Two classroom cycles in UX/app-prototyping classes (ages 14-17) examine adoption, pacing and learner agency through authentic make-critique-iterate pedagogy (Kolodner et al., 2003). Preliminary pilots demonstrate the system’s ability to scaffold reflective teaching and co-generate formative feedback using real rubric data. By operationalising evidence-policy-tactic mappings, ChatFang DesignEdu advances explainable, teacher-controllable feedback generation. The findings extend emerging work on explainable AIED and human-AI co-assessment design in authentic classroom contexts, aligning with recent multimodal feedback research . |
| Jamilla Lobo, Lenon Anthony, Silas Augusto, Gabriel Augusto, Andreza Falcão, Moesio Filho, Fabiola Ribeiro, Luiz Rodrigues and Rafael Ferreira Mello | Saving the Teacher's Time: Escreva Mais and the Future of Handwritten Essay Assessment | Escreva Mais is an innovative tool that integrates Artificial Intelligence (AI)-powered Learning Analytics (LA) into low-resource educational settings, addressing the significant challenges of high teacher workload and the student need for timely feedback. By addressing LA’s topics of Automated Essay Assessment and Personalized Feedback, while adhering to AIED Unplugged principles that foster more accessible AI-powered LA solutions, Esceva Mais enables Portuguese Language teachers to transform handwritten student essays into actionable learning analytics using only a smartphone camera. The system, powered by the Gemini 2.0 Flash, performs automated transcription and competency-based scoring. Then, the app delivers an overview of the classroom’s scores to teachers with aggregate and individual performance insights to improve students' writing skills, besides personalized feedback messages grounded on the literature. Escreva Mais demonstrates that LA tools can be effectively adapted to diverse realities. By prioritizing ease of use, solid pedagogical principles, and local context, it transforms manual, high-effort grading processes into personalized insights that might actively enhance learning and teaching experiences. |
| Yimei Zhang, Yajie Song, Linlin Li, Yiwen Yan and Maria Cutumisu | RefleCT: Learning Computational Thinking Through Reflection - Leveraging LLMs to Make Errors Visible | Studies have explored how Large Language Models (LLMs) support content generation, but not how they can help students recognize errors when solving computational thinking (CT) tasks. This pilot study developed and evaluated RefleCT, an LLM-powered system (GPT-4o-mini) that promotes metacognitive behaviors by facilitating students’ reflection on their CT errors using a sample of 33 undergraduate students from a North American University. RefleCT scaffolds learners’ reasoning through three stages: assessment, diagnosis, and sharing. The assessment stage prompts students to complete 19 validated CT tasks (Lafuente Martínez et al., 2022) and select mistakes to reflect on. The LLM-powered diagnostic stage, grounded in metacognition theory (Pintrich et al., 2000), guides learners in developing metacognitive knowledge (understand what, how, and when to apply CT), monitoring (examine their learning progress and task difficulty), and self-regulatory control (plan, adjust strategies, regulate efforts) through multi-turn reflective dialogues. Each dialogue follows a context–action–rule cycle inspired by Chain-of-Thought (Wei et al., 2022) and reinforcement learning (Kaelbling et al., 1996), producing personalized summaries that students can choose to revise or not. Findings showed that LLM-guided reflection promoted metacognitive behaviors as students reflected on CT errors with peers. The sharing stage provides a collaborative space for community-based error analysis. |
| Bhagya Maheshi, Hiruni Palihena, Flora Yi-Joon Jin, Ahmad Aldino, Thomas Ng, Ziyi Wang, Sachini Samaraweera, Danijela Gasevic, Nicola Charwat, Philip Wing Keung Chan, Guanliang Chen, Sadia Nawaz, Roberto Martinez-Maldonado, Dragan Gasevic and Yi-Shan Tsai | PolyFeed for Educators: A Feedback Analytics Tool to Facilitate Effective Feedback Processes | PolyFeed captures students' interactions with feedback across assessments and courses and visualises insights from those interactions to support more effective feedback processes. Its design is grounded in feedback theories and learning analytics, supporting educators to understand students' interactions with feedback, while also helping them to improve the feedback quality. PolyFeed consists of a browser extension and a dashboard. PolyFeed for students help them highlight and label their feedback, create reflective notes or action plans, seek feedback for upcoming assessments, and rate feedback. Educators can then use PolyFeed to view these interactions within the extension or through the dashboard. They can also use an inbuilt machine learning model to check the quality of their feedback in relation to the learner-centred feedback framework. A pilot study was conducted with 21 educators in a controlled setting and 10 educators in an authentic setting. The teachers highly value PolyFeed's potential to strengthen their feedback processes by providing insights into learner interactions with feedback, enhancing feedback quality with learner-centred feedback assessment, and informing feedback with inputs from learners including feedback requests, comments, and ratings. This demo showcases key features of PolyFeed Teacher and testimonials from users. |
| Stergios Tegos, Apostolos Mavridis, Konstantinos Manousaridis, Viktoria Pammer-Schindler, Nikolaus Obwegeser and Pantelis Papadopoulos | Gaia - a multilingual conversational AI tutor and its analytics | We demo Gaia, an AI conversational tutor with in-built multi-linguality and analytics capabilities. Gaia guides learners through case-based activities in clearly staged conversational parts that are aligned with Bloom’s taxonomy: case introduction, knowledge checks with Socratic hints, application of concepts, and structured argumentation feedback. Quality is stabilized via Chain-of-Thought reasoning and an Evaluator–Optimizer loop that self-critiques feedback before delivery, while a moderation layer safeguards interactions. An Educator & Administration Platform lets instructors author activities (from scratch or templates), manage teams/roles, and deploy scenarios with LTI 1.3 into Moodle. An analytics dashboard aggregates pseudonymized chat logs and LMS data to present actionable metrics to the educator or administrator, supporting formative teaching decisions. Multilingual delivery follows a hybrid approach: professionally localized static content and prompts, plus a translate-process-translate pipeline for live conversations, currently supporting six project languages (EN, DE, NL, EL, SL, ET). We will demo a complete workflow—authoring a case (“Nimbos”), sharing via embed/link, and a live learner interaction—alongside real activity analytics (e.g., completion and message distributions) from pilot runs in the field. The demo illustrates an end-to-end, pedagogy-first design that pairs innovative UI with robust backend architecture to address real classroom needs in feedback, transparency, and scalability. |
| Isabel Hilliger, Carolina Lopez, Ema Huerta and Marietta Castro | Beyond Clickstreams: A Web-based Tool for Capturing Self-Reported Student Workload in Higher Education | Learning Analytics has increasingly examined student workload as a key indicator for decision-making in higher education. While many studies rely on LMS interaction data to estimate workload, these approaches face limitations due to outliers and assumptions required to infer time-on-task from clickstreams. Recent research suggests that course credits alone do not fully capture workload, and that self-reported indicators can support course enrollment decisions and time management. In this context, we present a web-based application integrated with Canvas LMS that collects weekly timesheets of self-reported time and perceived difficulty across learning activities. This tool enables students to reflect on their time management and provides instructors with weekly visualizations of average time-on-task. Developed as part of a national research project on student workload, a preliminary study found a strong statistical correlation between self-reported and LMS-estimated time-on-task, although LMS data represented only a small fraction of self-reported workload. To assess the tool’s value, we conducted seven cognitive walkthroughs with 21 faculty and 19 students from three universities. Participants emphasized the usefulness of the visualizations for managing workload during the term and planning future academic periods. Participants also valued the weekly feedback to prevent overload and promote self-regulated learning. |
| Merve Ekin, Chris J Hughes, Krzysztof Krejtz and Izabela Krejtz | Seeing the Invisible: Real Time Cognitive Load Dashboard | Assessing cognitive load in practical learning environments remains a methodological challenge because the underlying mental processes cannot be observed directly. Existing approaches rely heavily on post-hoc surveys and cannot capture cognitive dynamics as they unfold. This eye-tracking study investigated real-time indicators of cognitive load in 60 students (36 female, aged 16.6 ± 0.6) during complex activities involving quantum physics.
A mixed-methods approach combined subjective and objective measures. Perceived task difficulty was evaluated using the NASA-TLX, while continuous data on gaze behaviour and pupil dilation, as physiological indicators of mental effort, were captured using mobile eye tracking. A key outcome of this study is the Real-Time Cognitive Load Dashboard, which uses real-time signal processing and feature extraction to stream and visualise eye-tracking metrics, such as pupil diameter, ambient-focal attention (K-coefficient), and gaze transition entropy, during learning. By dynamically visualising cognitive effort, the system identifies moments of high demand and provides insights into learners' engagement with complex material. This approach advances multimodal learning analytics by linking physiological data with cognitive processes in a real-world educational setting, which supports adaptive teaching, real-time feedback for instructors, and future XR/immersive learning experiments. A demonstration is available at https://shorturl.at/vW0uy. |
| Justin Student and Graham Douglas | Mr. Kato: Instrumenting Conversational AI for Feedback-Rich, Real-Time Clinical Simulation | This interactive demo showcases Mr. Kato, a conversational AI training tool enabling medical students to practice patient-centered interviewing and diagnostic reasoning through natural text and voice interaction. Grounded in the CanMEDS Communicator role, the system enables learners to interview an AI “patient” while an AI “preceptor” provides formative feedback aligned with communication competencies such as active listening, information-gathering, and summarizing findings, all within a psychologically safe, simulation-based environment.
The platform integrates AI services for speech and text to simulate authentic clinical encounters, with all interactions securely logged in an institutional database compliant with UBC’s privacy and AI guardrails. Analyses of event data and qualitative feedback reveal valued but selective use of feedback features, guiding iterative design changes with student and faculty input. Learners progress through a two-stage encounter, from patient history to differential diagnosis, allowing comparison of engagement and time-on-task with prior cohorts lacking the AI component. Piloted with third-year medical students, Mr. Kato demonstrates how conversational agents can extend feedback-rich, competency-aligned clinical training. Attendees will interact with the live interface, explore engagement dashboards, and discuss how these findings inform a planned virtual OSCE station and institutional analytics infrastructure. https://edtech-ai.med.ubc.ca/videos/2025-11-10_lak26_kato-demo.mp4 |
| Yuheng Li, Zhiping Liang, Dragan Gašević and Guanliang Chen | Edvance: An AI-Powered Tool for Delivering Actionable, Personalised, and Adaptive Feedback at Scale | AI-powered learning analytics are increasingly used to guide students toward academic success, particularly those who struggle. However, these analytics often require experts with technical and pedagogical knowledge to translate insights into understandable, actionable feedback, limiting scalability in real-world settings. Advances in Generative AI (GenAI) offer opportunities to address this challenge. Edvance combines the use of predictive, prescriptive and generative AI techniques to convert students’ learning data into natural-language, actionable learning feedback, and integrates features to promote self-regulated learning. Implemented as a Chrome extension, Edvance enables students to: 1) track weekly learning progress through visualisations and optionally compare performance with peers or high-achieving students; 2) receive both remedial and forward-looking learning actions enabled by GenAI; 3) set learning goals; and 4) reflect and update goal progress. An integrated retrieval-augmented chatbot also supports course-related queries, while a teacher-facing dashboard allows instructors to monitor student engagement and provide timely interventions. Pilot interviews with three students showed positive perceptions, especially valuing clear and actionable feedback, while suggesting improved personalisation and interface design. A lab study with thirteen students found GenAI feedback comparable—and sometimes superior—to human tutors in affirmation and helpfulness, supporting effective goal-setting. This demo will present Edvance’s functionalities, benefits, challenges, and ethical considerations. |