An example: handling missing values

Data used to track, manage, and optimize resources.
Post Reply
asimj1
Posts: 395
Joined: Tue Jan 07, 2025 4:33 am

An example: handling missing values

Post by asimj1 »

It should also be noted that data pre-processing is usually a complex set of interrelated steps, rather than simply a matter of completing individual tasks in different pieces of software. I tend to use Python to do it, but you can use whichever language you prefer.

During the UK Data Service Data Pre-processing webinar france rcs data series (for recordings, see links below), one of the most frequently asked questions is ‘what should I do with the missing values in my data sets?’ Missing values are arguably one of the most annoying issues for data scientists and researchers. On the one hand, it is necessary to check missing values as some modelling algorithms require a complete data set to run properly; on the other hand, nearly all data sets have some missing values, and handling them can be a very tricky task!

There are various strategies for dealing with missing values, including, for example, deleting the row or the column, or imputing a specific statistic or a random number. The best solution will be the one that can approximate the original, unknown value, and thus optimise the eventual performance of the data model. However, there is no easy fix and it is usually hard to provide a direct solution without knowing the exact context of the research project in question.
Post Reply