Chapter 18 : Diverse Big Data and Randomized Field Experiments in MOOCs

have generated unprecedented amounts of educational ic communities and sparked interests in disciplines that were historically less involved in the learning terdisciplinary communities and given rise to entirely new communities at the intersections of education, computer science, human factors, and statistics. For research: the availability of not just big but also diverse learner-level educational data, and the opportunity to

The second feature of MOOCs that promises to increase the pace and impact of educational research is the ability industry, this is referred to as A/B testing to convey that individuals are randomly assigned to one of two enables rapid iteration to test theories and practices, in real time and at low cost.For instance, a researcher might compare different versions of lecture videos in a course (e.g., varying the introduction to a topic and how concepts are presented) and observe performance on subsequent assessments.Once enough data has been collected, the researcher may drop those lecture versions associated with the lowest scores and add new versions, and continue iterating.In this process, the educated learners.This may provide novel theoretical insights and it calls for adaptive presentation of condifferences and responsive adaptation of content is feasible in digital learning environments with large heterogeneous samples of learners, such as in MOOCs.
The goal of this chapter is to map out these two features of MOOCs, in light of the development of the features might advance the theory and practice of learning and instruction.We begin this chapter with a brief historical overview of the genesis and development of MOOC initiatives.We discuss the advantages of big data and diverse learner samples for research and we review work that has begun to leverage these affordances.Then we turn to the opportunities eration, and discuss how these have been used thus far in MOOC platforms.We close the chapter with a discussion of current limitations and ways to leverage the opportunities of large-scale digital learning environments more effectively.a long tradition of efforts to increase access to education, including distance learning (e.g., correspondence schools, radio instruction), open access universities, and open educational resources (Simonson, Smaldino, constitutes a MOOC fundamentally shifted between when the New York Times to distinguish the original cMOOCs, which are based on the connectivist pedagogical model that empha-MOOCs of 2012 and onwards), which are mostly based on the instructionist model of lecture-based courses with assessments and a rigid course structure.Stan-Koller, and Andrew Ng, who re-envisioned MOOCs as a broader audience, sparked this ideological shift.This vision led to the creation of several corporate notably Coursera, Udacity, EdX, and FutureLearn.Institutions of higher education worldwide rushed to contribute to the growing number of courses, with each course attracting tens of thousands of learners (Waldrop, 2013).
once it became apparent that MOOCs fell short of delivering on the promise of providing universal lowevidence was that only a small percentage of learners who start a course go on to complete it (Clow, 2013;Breslow et al., 2013), and although completion is not critical barriers have remained unaddressed.The of advancing access for historically underserved populations.Many MOOC learners are already highly educated (Emanuel, 2013).Moreover, learners in the individuals with greater socioeconomic resources 2015).Further evidence shows that socioeconomic achievement gaps in MOOCs occur worldwide in terms of education levels and levels of national deand, moreover, that women underperform relative to be partly due to structural, cultural, and educational barriers (e.g., Internet access, prior knowledge, Additionally, learners may face social psychological barriers, such as the fear of being seen as less capable because of their social group (i.e., social identity

THE PAST AND PRESENT OF MOOCS
threat) and feeling uncertain about their belonging in 2007).At least in terms of completion rates, North more privileged learners, posing a critical challenge for a technology designed to promote educational equity.This highlights the amplifying power of techaddress them.In fact, as evidence of a lack of support for underserved populations emerged, platform providers broadened their initial focus on prominent US partners and started pursuing international university MOOC platforms focused on providing desktop-based devices as a way to increase access in developing countries, where mobile Internet is pervasive.learning activities has been under constant consideration as the movement has matured.While content was originally provided free of charge, the value of a vidual's identity is linked more closely to their course activity) has attracted learners willing to pay for access A number of institutions offer degrees independent of academic institutions (e.g., Udacity Nanodegrees); use MOOCs as a gateway opportunity to complete liberal EdX Freshman Academy partnership); create compact online postgraduate programs (e.g., MIT Microdegrees) with the option to use this credential as a pathway to a traditional graduate degree; and offer full graduate programs online (e.g., University of Illinois' iMBA and There is an increasing supply of courses and short programs from various institutions, especially on popto see increased competition between educational institutions both to attract learners to their courses and to offer courses that can demonstrate superior workplace performance and career opportunities.
Theories of learning and instruction describe parts to predict and infeasible to test.We therefore rely identify the variables that matter (Koedinger, Booth, & Klahr, 2013).Nonetheless, educational theory is never conclusive or all encompassing.Empirical research in education, and the social sciences at large, tends to research in the social sciences is based on studies of who participate in psychology lab studies (Henrich, technology-enhanced education (Ocumpaugh, Baker, To address this challenge, researchers require access to learner samples that are larger and more diverse than traditionally available.Such diverse learner samples are commonplace in MOOCs. The supply and range of courses available on MOOC platforms has steadily increased since their initial offerings (Shah, 2015).These courses have been created by institutions around the world, including universities, museums, and national institutes.In early million learners worldwide. 1Most learners are located on average, but the gender ratio ranges from 22% in various course topics varies by gender and location: business courses are most popular in France, while area with (globally) the least gender balance.Coursera learners tend to be well educated: around 80% had already earned a bachelor's degree, according to resembles that of FutureLearn, a MOOC platform based in the United Kingdom, used by 3 million learners in 2 : 73% hold degrees and, in contrast to Coursera,  Chandler, & Sweller, 2003).Taken together, diverse big data can advance a more inclusive science that moves beyond tailoring to averages.
dividual differences, there is a scarcity of replication for only 0.13% of published papers in the 100 major studies rely on relatively homogenous student popwere more diverse, the meta-analysis of individual complicated by variation in instructional conditions, much of which remains unobserved and therefore generally can begin to address this pressing issue.These environments are particularly well suited for conducting large-scale studies with diverse samples study in a MOOC by rerunning the same course or by embedding the same study in another course.In fact, MOOCs are especially suitable for conducting disciplinary research into what instructional approaches are most effective for different groups of learners.For teaching the second law of thermodynamics have been but it is unclear which approach is most effective for learners from different parts of the world.
A critical dimension of diversity in MOOCs is geography, and thus culture.MOOCs assemble learners from Western and Eastern countries with their distinct cultural foundations of learning (Li, 2012).In Eastern viewed as a virtuous, life-long process of self-perfection, countries (e.g., US, Canada), learning is seen as a form of inquiry that serves the goal of understanding the world around us, based on Socratic and Baconian inattend Western universities go through a period of academic adjustment (Rienties & Tempelaar, 2013) and this culture shock can hinder learning and raise of cultural differences, and individual differences more ing theories by evaluating understudied dimensions of learner characteristics, both at an individual and group level.New insights into individual differences can inform current practices that support academic in-person and online environments.).Their research demonstrates a promising avenue for leveraging diversity as an educational testing for gender effects and evaluating alternative optimal presentation of the instructor in lecture videos.Seeing a human face can make it easier to pay attention, but it can also be distracting.The image principle posits that showing the instructor in a video does not affect learning outcomes, because the -tion is a critical anteced-MOOC learners in a course preferred watching videos without the face when given a choice, mainly because they constantly showed the instructor was compared to a strategic version that omitted the instructor when it was distracting.The strategic presentation raised perceived cognitive load and social presence, but it had no overall effect on persistence or course grades.However, accounting for learning preference (i.e., whether individuals preferred learning from pictures and diagrams or from written and verbal information), there was a substantial individual difference in percourse with the strategic than the constant presentation.This demonstrates the importance of both accounting for individual differences in practice and distracting or motivating for different people, this insight is worth incorporating in learner models for targeted instructional design.

Current Research that Leverages Diverse Big Data in MOOCs
Compared to traditional learning management systems used in higher education, MOOCs offer a limited set of options to course designers and hardly any new features.However, the large and diverse learning community behind MOOCs provides a remarkable opportunity to learn more about learning and teaching research with MOOCs focused on the analysis of course log data collected by default (e.g., clickstreams)

TESTING THEORY AND EVALUATING EDUCATIONAL PRACTICES USING ONLINE FIELD EXPERIMENTS
and self-report measures from course surveys with relatively low response rates.The recent availability of has enabled researchers to conduct simple randomencouragements to promote engagement and learning, another concerned with changes to the course content and structure, and a third where MOOCs serve as a laboratory for studying general phenomena -and discuss methodological considerations going forward.

Three Streams of Published Experimental Research in MOOCs
small encouragements or nudges to improve course outcomes.These types of interventions can be conassigning learners to receive different messages.A number of studies employed A/B tests to increase participation in discussion forums.Lamb, Smilack, Ho, and Reich (2015) tested three treatments (a self-test participation check, discussion priming with summaries of prior discussions, and discussion preview emails about upcoming discussion topics) and found that the participation check increased forum activity Cohen, and McFarland (2014) tested framing effects in email encouragements for forum participation in participation relative to an individualistic or neutral social comparison paradigm (Festinger, 1954).Learners received an email with either an upward social comparison (describing how many learners outperform you), a downward social comparison (describing how many perform worse), or a control message omitting any social comparison.While the downward comparison motivated high-performing learners, struggling learners benefited from the upward comparison.and unanswered questions increased forum activity, and that reminder emails about unseen lecture videos increased course activity (i.e., lecture views), compared to other reminders.However, a downside of email interventions is that researchers typically cannot the treatment, which raises an analytic challenge for estimating treatment effects (cf. Lamb et al., 2015).inside a survey, offer an alternative.One study had survey-takers randomly assigned to receive either tips about self-regulated learning or a control message about course topics, but found no improvement in optional surveys is self-selection into the study, which tends to skew the sample towards more committed learners who may respond to the treatment differently from those who opted against taking the survey.In general, although small nudges can have surprisingly large effects on human behaviour (Thaler & Sunstein, ined theory-based changes to course content and they found no improvements in course engagement.and Reich (2015) compared two consecutive instances of the same course with different content release models, staggered versus all-at-once presentation of in persistence and completion rates.To facilitate two established learning strategies (retrieval practice and once again, no improvements in course persistence and completion were detected.Building on multimedia how the presentation of the instructor's face in video and they found heterogeneous effects on attrition, of instructor contact on various course outcomes.Their high-touch condition, which had instructors respond to forum questions and send weekly summaries, did not improve satisfaction, persistence, or completion rates, compared with a low-touch condition without Hartmann (2014) evaluated the impact of adopting a reputation system in the discussion forum and found that it increased response times and the number of responses per post, but it had no effect on grades or persistence.Another study evaluated different reputation systems and found that a forum badging badges increased forum activity (Anderson, Huttenlocher, Kleinberg, & Leskovec, 2014).Most studies in this stream of research, despite using stronger main learning outcomes.
MOOCs as a lab environment to test general the-for (potentially unconscious) bias in online classes, sages total, eight per course) and randomly assigned learner names to be evocative of different races and genders.They found evidence of discrimination: instructors wrote more replies for white male names than for white female, Indian, and Chinese names.In testing social-psychological barriers to achievement, theory-based intervention activities, designed to mitigate concerns about not belonging in the course, can effectively close the global achievement gap between learners in more versus less developed countries.On the role of geographical diversity in peer video discussion and found that being in a more diverse group improved subsequent test performance.Leveraging performance causes attrition, due to the upward social about the peer grading process (i.e., how grades are adjusted and computed) affects learners' trust in peer highlights the fairness of the procedure can promote resilience in trust for learners who received a lower of MOOCs and their results hold promise for enriching theory and practice.

Methodological Considerations for Randomized Field Experiments in MOOCs
From this review of published research, it stands out results.This may be surprising, given that MOOCs testing for heterogeneous treatment effects.In genin MOOCs, it is advisable to focus on the magnitude of treatment effects in addition to their statistical number of possible outcomes and covariate measures in MOOC data, there is a real danger of increasing the Type I error rate (false positives) as a result of multiple challenge, replication, pre-registration, and use of Bayesian alternatives to frequentist hypothesis testing evidence going forward.
are poised to deliver significant contributions to theory in education and related disciplines.Yet the the availability of real-time data and the level of access mean that most researchers are still only testing one idea at a time at the pace of new courses going live.A perimentation to accomplish the dual goal of learning This would provide a special opportunity for advancing scholarship in disciplinary teaching and learning.For the concept of recursion and comparing test results sign), multiple approaches to recursion can be taught order.This would enable simultaneous tests of multiple process that currently requires a whole community of researchers and substantial resources.Williams and which are small pieces of content that adapt based instance using a multi-armed bandit algorithm (Bather researchers a novel opportunity to enrich theory and practice at a fast pace.rapidly to heterogeneous learner populations has the potential to disrupt the way educational research is carried out.While in the past learning theories might arise from the careful study of a small number of highly selective environments where control over of thousands of learners across the globe in a single Traditional higher education research has faced two digm of responsibility of instruction.In the higher education classroom, the faculty member teaching the course tends to be completely responsible for tal approach, and ensure that all students within the class have equal access to support and interventions.Innovation in these circumstances tends to be through en cohort or year of study are compared with those of other cohorts or years of study, introducing more confounding variables.In MOOCs, the paradigm is different, perhaps due in part to the broader constellation of actors, including institutional administration and vendor partners, who assume some responsibility for the success of learners.The culture of some of these actors around balancing risk and reward, especially within venture-capital funded enterprises where rapid prototyping and testing is the norm, has fostered more to advance learner success.
A second constraint in traditional higher education research is the acceptable amount of risk and reward of years of higher education demonstrating value to society, and tens or hundreds of thousands of dollars on the line in tuition for a given student, it is harder for researchers to make the ethical argument to engage in high-risk research.Yet in MOOC environments, the majority of learners enrol at no cost, and few are in peril of losing their livelihood over the results of This difference is have strong protections for student records and Recent approaches to accepting MOOC credentials as credit in have begun to change the stakes for learners.
the United States.However, the same obligations do lifts some of the constraints from policies concerning researchers: 1.The population of MOOC learners is different and, in many ways, much more diverse than that of traditional educational research, in terms of learner demographics (age, race, cultural background, et cetera), prior knowledge, and motivations for taking courses.This broader representation, along with the vast numbers of learners, provides an opportunity for scholars to test the generaland to identity learning theories that are most enable quantitative approaches to problems that populations who use MOOCs for social mobility for the breadth of learners enrolled in MOOCs.
2. platform at no cost enables researchers to leverage the volume and variance of learner data for greater close the feedback loop in education by promoting the integration of research, theory, and pracpopulation in MOOCs, there is a real possibility to build environments that (semi-)automatically platform data.
In this chapter, we have called out two affordances of research with MOOCs -the availability of diverse enable a more inclusive and agile science of learning.
and investigation of computational methods, are well poised to answer this call and achieve even broader impact going forward.Festinger, L. (1954).A theory of social comparison processes.Human Relations: Studies towards the Integration of the Social Sciences, 7

CONCLUSION
The effects of instructional conditions in predicting academic success.
Researchers are just beginning to leverage the potential of large, heterogeneous learner data from MOOCs.Recent studies have investigated demographic and in course completion of learners in the United States 2017).Notably, although learner demographics account provide limited improvements over behavioural log Thompson, & Teasley, 2015a; Brooks, Thompson, & from the literature that highlight a range of possible approaches for leveraging diversity in MOOCs.Bernstein, & Klemmer (2015), who set out to harness the diversity of MOOC learners to improve engagement of social interaction in MOOCs as an opportunity for innovation and research.To address this shortcoming, they engineered a peer discussion system that puts online learners in touch with others in the course via learners were assigned into discussion groups with high versus low geographical diversity in terms of how measures of performance were assessed in each to assess conceptual understanding of the session, high-diversity peer discussions yielded short-term contrast, showed no effect overall or differentially by diversity condition, counter to what might be predicted Malone, 2010 Open EdX), or through proprietary or custom developed platforms.Together, according to data collected by Class Central(Shah, 2015), 550 institutions have created 4,200 courses spanning virtually all disciplines that are reaching a remarkably heterogeneous population of over 35 million people worldwide.While the data collected within a MOOC is high in velocity and volume, two of the three characteristics of big data(Laney, 2001), it can be limited with respect to variety, unless active measures are taken to achieve variety.Traditional educational systems collect both detailed demographic information (e.g., 3and Udacity, 4 two other major offer MOOCs, either through traditional learning management systems (e.g., the Canvas Network), via institutionally deployed open-source platforms (e.g., 1 2 https:/ /about.futurelearn.com/press-releases/futurelearn-has-3-million-learners/3 4ps:/ /www.udacity.com/successENRICHINGTHEORYANDPRACTICEWITH DIVERSE BIG DATA IN EDUCATIONand prior knowledge measures (e.g., prior college scores).However, these variables are not collected automatically in MOOCs in order to maintain a low barrier to entry.Many MOOC-providing institutions thus began to collect this data through optional surveys.A self-selected group of learners, who tend to be more committed to completing the course, are also those who tend to complete these surveys.Reich (2014) suggested that roughly one quarter of enrolled collected survey responses can be high (often tens of thousands), though it is important to remember that these data represent a skewed sample of generally more motivated learners.Improved mechanisms for data collection that overcome current limitations for unobtrusively obtaining comprehensive information on learner backgrounds are needed.Nevertheless, currently available survey data supports the assumption that MOOC learners constitute a relatively heterogeneous population from around the globe.
The Internet and Higher Education, 28 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems Journal of Educational Psychology, 107 Proceedings of the 2 nd ACM Conference on Learning @ Scale New York: ACM.gies does not improve performance in a MOOC.Proceedings of the 3 rd ACM Conference on Learning @ Scale tions in massive open online courses.Proceedings of the 3 rd International Conference on Learning Analytics and Knowledge design with the OLEI scale.ACM Transactions on Computer-Human Interaction, 22 courses with collectivist, individualist and neutral motivational framings.eLearning Papers, 37 ter with small groups in massive classes.Proceedings of the 18 th ACM Conference on Computer Supported Cooperative Work & Social Computing New York: ACM.