LASI 2014 Workshop (Tuesday)

Text mining and educational discourse

Text mining and educational discourse

Carolyn Penstein Rosé, Carnegie Mellon University

Workshop Overview: The key insight communicated through this tutorial is that if we can understand the connection between socio-psychological processes and language by means of the social signals encoded in them, we can structure computational models of language interactions more effectively. This tutorial will be composed of a theoretical component and a hands on component. In the theoretical component, I will give an overview of work related to the connection between discourse and learning.  In service of that, I will discuss my group’s work on a computational reinterpretation of qualitative frameworks for studying conversational interactions from sociolinguistics and discourse analysis. The focus of the work I will present is to take these rich and expansive but informally specified constructs and distil them into precise operationalizations that capture the most important essence and render them learnable by machine learning algorithms. The challenge is in identifying what that most important essence is in a way that lends itself to generalization across contexts and predictive validity for important learning and interaction outcomes. In the hands-on component, I will offer instruction on use of a freely downloadable tool for facilitating the application of machine learning to natural language data called LightSIDE that provides a convenient GUI environment for novice users of text classification technology easily run text extraction and classification experiments. On top of that, LightSIDE serves as a vehicle for dissemination of new techniques for effective application of machine learning to text mining, including novel feature extraction techniques. The newest version (LightSIDE 2.0) includes a model specification panel that enables easy use of multi-level modeling techniques from applied statistics as domain adaptation and multi-domain learning approaches. One of its most unique capabilities is its sophisticated support for error analysis.

To prepare for this tutorial, please download and install LightSIDE ( on your laptop and bring it with you to the tutorial.  You may wish to read the following articles in preparation (available on request):

  1. Rosé, C. P., Wang, Y.C., Cui, Y., Arguello, J., Stegmann, K., Weinberger, A., Fischer, F., (2008). Analyzing Collaborative Learning Processes Automatically: Exploiting the Advances of Computational Linguistics in Computer-Supported Collaborative Learning, submitted to the International Journal of Computer Supported Collaborative Learning 3(3), pp237-271.
  2. Gweon, G., Jain, M., Mc Donough, J., Raj, B., Rosé, C. P. (accepted). Measuring Prevalence of Other-Oriented Transactive Contributions Using an Automated Measure of Speech Style Accommodation, International Journal of Computer Supported Collaborative Learning
  3. Howley, I., Mayfield, E. & Rosé, C. P. (2013). Linguistic Analysis Methods for Studying Small Groups, in Cindy Hmelo-Silver, Angela O’Donnell, Carol Chan, & Clark Chin (Eds.)International Handbook of Collaborative Learning, Taylor and Francis, Inc.


Bio: Carolyn Rosé is an Associate Professor of Language Technologies and Human-Computer Interaction in the School of Computer Science at Carnegie Mellon University.  She serves as President Elect of the International Society of the Learning Sciences (, Executive Board member of the International AI in Education Society, and Associate Editor of the International Journal of Computer Supported Collaborative Learning and of the IEEE Transactions on Learning Technologies. Her research program is focused on better understanding the social and pragmatic nature of conversation, and using this understanding to build computational systems that can improve the efficacy of conversation between people, and between people and computers, with the goal of improving human learning. In order to pursue these goals, she invokes approaches from computational discourse analysis and text mining, conversational agents, and collaborative learning.


Register | Lost Password