Mining educational data to analyze learning and teaching methods, the case of medicine

Educational data mining (EDM) is emerging as a research area with a suite of computational, human science methods and research approaches for understanding how students learn. New computer-supported interactive learning methods and tools—intelligent tutoring systems, simulations, games—have opened up opportunities to collect and analyze student data, to discover patterns and trends in those data, and to make new discoveries and test hypotheses about how students learn. Data collected from online learning systems can be aggregated over large numbers of students and can contain many variables that data mining algorithms can explore for model building.

EDM concerned with developing methods for exploring the unique types of data that come from educational settings, and using those methods to better understand students, and the settings (learning situations) which they learn in.

EDM focuses on developing new tools and algorithms for discovering data patterns. EDM develops methods and applies techniques from statistics, machine learning, and data mining to analyze data collected during teaching and learning. EDM tests learning theories and informs educational practice.

The rationale of the project is to explore a general technical and pedagogical framework to support decisions in the context of technology enhanced learning (TEL), i.e., decisions to be taken by TEL stakeholders (teacher, tutor, student, institution, community of practice) based on learning and teaching data. A teacher is a TEL stakeholder when preparing teaching material (e.g., articulating resources for a teaching sequence or designing a learning scenario), when conducting teaching (e.g., adapting teaching material or scaffolding students while the learning setting is enacted) and when evaluating learning outcomes. A tutor is a TEL stakeholder when scaffolding and supervising learning process. A student is a TEL stakeholder when attending a course (e.g. comparing with the others students, finding the best resource, self regulating learning…). An institution is a TEL stakeholder when managing teaching and learning processes at the institutional level (e.g., proposing a study program, developing faculty, administration workflows...). A community of practice (e.g. community of teacher in mathematic) is a stakeholder when creating and sharing content, pedagogical scenarios, tools to solve problems, or practices. We call TEL stakeholder the role of taking decisions in TEL situations.

We consider that TEL decisions may be improved by providing the TEL stakeholders with pertinent analysis and reporting of collected educational data such as: data denoting students’ motivation, actions and knowledge; data denoting other teachers’ actions and/or constructions within a community of practice perspective.

The medical context is an ideal context to explore this new research framework. Indeed, we can find several learning situations using emerging pedagogical approach (flipped classroom) with classical TEL (QCM answers), like PACES, or innovative pedagogical approach with new TEL systems (Serious games, simulators), like LOE (see partner description) or TELEOS. Also it is a large-scale experimental field with thousands of students and dozens of teachers.

In both cases we will build process analysis for several TEL stakeholders to answer several questions. The analysis process will be capitalised in the Undertracks platform (explained in the section “partner description”). These analyses will be designed in order to be sharable and reusable.

To study the reuse property of the analysis processes some of them will be applied in other similar context. In particular the analysis process to understand the student evolution and the co-design process with tutors could be reuse in the context of the “C2i niveau 1” because both use the same learning platform (learneos) and the objective of the tutors are closer.

Research Objectives

The general goals of the long-term collaboration are:

1. Predicting students’ future learning behaviour by creating student models that incorporate such detailed information as students’ knowledge, motivation, metacognition, and attitudes;

2. Discovering or improving domain models that characterize the content to be learned and optimal instructional sequences;

3. Studying the effects of different kinds of pedagogical support that can be provided by learning software;

4. Advancing scientific knowledge about learning and learners through building computational models that incorporate models of the student, the domain, and the pedagogy.

The court term goals are:

  1. Explore and design analysis process to understand the student evolution from PACES data. Propose an indicator to measure it.
  2. Reuse this analysis process in other similar context.
  3. Explore and co-design, with PACES tutors, analyses process to help them to build efficient pedagogical scaffolding for build useful MCQ, i.e. taking into account students and institutional constraints. .
  4. Propose a set of indicators to inform a computer student model. Each indicator it will be an analysis process. The indicators analyse student behaviour, student knowledge and student cognitive strategies based in LOE learning platform. Test the indicators in the case of TELOS project.


Why collaborate ?

The Undertracks platform ( proposed by MeTAH team, has the objective to share experimental data, operators (algorithms to analyse and visualises data) and processes to analyse educational data [1]. They design and implement a web-based platform to store structured educational data and operators (in the current version java or C++). In particular they design a GUI and engine workbench which allows the association of data and operators. However this platform is designed for sharing the analysis of experimental data among researchers, but it does not provide functionality for end users like TEL stakeholders. This project allows us to be further and study the properties of the platforms to the end-user. Also, several innovative operators (algorithms to analyse and visualize) and process need to be designed to answer the previous research objectives.

Also MeTAH team manage TELEOS platform [3], which was be designed in a partnership with TIMC, LIP, orthopaedic Grenoble CHU service and TIMC. TELEOS is a simulation-based Intelligent Tutoring System (ITS) that has been implemented in order to offer a complementary dimension of apprenticeship in the learning process of orthopedic surgery. In fact, along with declarative knowledge to be mastered by intern surgeons, half part of their formation requires repeated practices to be completed. Knowledge involved in this part is perceptual-gestural, that is, often tacit and empirical. Moreover, tasks related to this type of knowledge is also ill-defined as to execute a given operation different strategy patterns can be applied and no precise way can be defined in advance to reach validation criteria of related tasks. This induces a gap in the learning process that can hardly be bridged by traditional teaching methods. The TELEOS learning environment aims at providing the missed intermediate phase of apprenticeship [4].

The Themas team is strong involved in the PACES new pedagogy and they co-design the LOE platform which is a serious game to learn concepts around epidemiology.

The Laboratorium of Epidemiology© (LOE) immerses learners into a full-scale persistent and distributed simulation combined with a game scenario [2]. It was collaboratively designed and used by researchers, teaching staff (hereafter called tutors), and students as both an educational project and a research project. The educational project forms part of a medical school course in biostatistics. The research project, to which we gave the name “Laboratorium”, allows repeated data collection campaigns that are not unique events in students’ or tutors’ lives and is an attempt to reduce data collection biases and produce well-documented databases.

The game is based on a computerized simulation of various institutions (including hospitals) and role-play. Students play the role of public health physicians and are placed in an otherwise inaccessible professional situation involving the occurrence of a disease in several hospitals. Students have to design and carry out an epidemiological study, and write a scientific article to be presented at a congress. The main learning goal concerns the statistical analysis of a medical database. However, students are given a mission that contextualizes the problems of doing statistics. The mission is to design a diagnosis tool for VTED (Venous Thrombo-Embolic Disease) for hospital use. While working on this problem, students will learn statistics, understand the role they play, and more generally the function of statistics in public health. The game is fully integrated into the standard medical school curriculum. It lasts four months including eight four-hour sessions in class.

Both frameworks (PACES and LOE) produce educational data. Automate processes and speed up the collection and mining of this educational data will be crucial to understanding complex phenomena and generating new knowledge about students’ behaviors and pedagogical phenomena.


Contacts :

MeTAH team (LIG laboratory) : Vanda.luengo [at]

Themas tem (TIMC laboratory) : pierre.gillois [at]


Publications, supported by the project

Toussaint, B.-M., Luengo, V. and Jambon, F. and Tonetti, J.: From Heterogeneous Multisource Traces to Perceptual-Gestural Sequences: the PeTra Treatment Approach. In: Conati, C., Heffernan, N., Mitrovic, A., Verdejo, M. F. (eds), Proceedings of the 17th International Conference on Artificial Intelligence in Education (AIED 2015), Madrid, Spain. LNCS, vol. 9112, pp. 480-491. Springer, Heidelberg (2015)

Toussaint, B.-M., Luengo, V.: Mining surgery phase-related sequential rules from vertebroplasty simulations traces. In: Holmes, J.H., Bellazzi, R., Sacchi, L., Peek, N. (eds.) Proceedings of the 15th International Conference on Artificial Intelligence in Medicine (AIME 2015), Pavia, Italy. LNCS, vol. 9105, pp. 32–41. Springer, Heidelberg (2015)

Toussaint, B.M., Luengo, V. et Jambon, F.: Proposition d’un framework de traitement de traces pour l’analyse de connaissances perceptivo-gestuelles - Le cas de la chirurgie orthopédique percutanée. Actes de la 7e édition de la Conférence sur les Environnements Informatiques pour l'Apprentissage Humain (EIAH 2015), Agadir, Maroc, juin 2015