Data fuels every stage of analysis—from spotting trends to training predictive models—and this unit shows you how to collect, structure, clean, and prepare that data with confidence. You’ll use Python as your toolkit to source information from flat files, web APIs, and SQL databases, then turn messy, real-world inputs into reliable datasets.

Across the units, you’ll work with core libraries like NumPy and Pandas while mastering practical data quality techniques: handling missing values, fixing types, standardising formats, reshaping tables, merging multiple sources, and engineering useful features. Through hands-on exercises, you’ll design clean, repeatable data pipelines, troubleshoot common issues, and export polished datasets ready for visualisation, modelling, or downstream teams.

Level: Intermediate
Category: LABS