from 30 January 2023 to 1 February 2023
Complesso S. Marcellino e Festo
Europe/Rome timezone
The Political Sciences and Physics "E. Pancini" Departments of Federico II University organize the 1st International Conference on Measurement in STEM Education (MESE1).

Clustering educational data: a high school students' performance analysis

31 Jan 2023, 11:55
G4 (Complesso S. Marcellino e Festo)


Complesso S. Marcellino e Festo

Largo S. Marcellino, 80138 Napoli NA
Invited talk Assessing student competence: current methodological issues in case studies (Invited Symposium) Assessing student competence: current methodological issues in case studies (Invited Symposium)


Matteo Farnè (University of Bologna)


In this talk, we first briefly discuss the definitions of Educational Data Mining and Learning Analytics, providing an overview of the most used statistical methods in both fields, as well as a synopsis of their differences and similarities. The possible aims of the different methods, aimed at uncovering hidden patterns in educational data, are stressed pointing to the possible services provided to the whole school community. In the following, we focus on clustering methods for educational data: we start from traditional methods (hierarchical, partitive, and density-based), we proceed with dimension reduction methods, such as factorial k-means and reduced k-means, and we present some methods which incorporate the longitudinal dimension. Then, we present a pilot analysis carried out on a dataset reporting the performance of a class of high school students in three periods (which were treated as three separate datasets), using hierarchical clustering, partitive clustering (k-means), factorial k-means and reduced k-means techniques. The goal of the analysis is to show how the composition of groups and the number of groups vary in each period and which are the factors that influence the creation of groups. The partitions obtained with these algorithms were compared in terms of reliability using the average silhouette width index. Reduced k-means and k-means generated similar results and we can say that these results were the most acceptable considering the average silhouette width. Hierarchical clustering generated the same results as the former algorithms only in the first two periods of time. The results generated by factorial k-means differ from the other methods and as suggested by the values of the average silhouette width, it is not the best algorithm for clustering on the dataset available to us. The underlying meaning of clusters over time and the reasons behind statistical results are discussed and analyzed in detail, with the aim to highlight possible student group structures present in a high school class.

Research Strand Data Science for Learning Processes and Education

Primary author

Matteo Farnè (University of Bologna)


Presentation Materials

Your browser is out of date!

Update your browser to view this website correctly. Update my browser now