From dirty data to multiple versions of truth
Learning analytics is the analysis of student data with the purpose of improving learning. However, the process of data cleaning remains underexposed within learning analytics literature. In this paper, we elaborate on choices made in the cleaning process of student data and their consequences. We illustrate this with a case where data was gathered during six courses taught via Moodle. In this data set, only 21% of the logged activities were linked to a specific course. We illustrate possible choices in dealing with missing data by applying the cleaning process twelve times with different choices on copies of the raw data. Consequently, the analysis of the data shows varying outcomes. As the purpose of learning analytics is to intervene based on analysis and visualizations, it is of utmost importance to be aware of choices made during data cleaning. This paper's main goal is to make stakeholders of (learning) analytics activities aware of the fact that choices are made during data cleaning have consequences on the outcomes. We believe that there should be transparency to the users of these outcomes and give them a detailed report of the decisions made.
|Published in||BLED 2019 Proceedings 57|
|Key words||learning analytics , data cleaning|