DataPy CADi Lectures

 

A 3 day data-wrangling bootcamp that I designed for professors in my alma mater, Tecnológico de Monterrey.


A 3 day data-wrangling bootcamp that I designed for professors in my alma mater, Tecnológico de Monterrey.

Introduction

After the initial DataViz CADi I was hired to teach a data wrangling course using Python in a 3-day bootcamp. This time, instead of using Mathematica and R, we’d focus on the use of Python for common data-processing operations.

Objective

This course was designed to serve as an introduction to developing data-wrangling routines in Python. The bootcamp covered tools used in best-practices like virtual environments, and github; and also included exercises in the use of a wide array of libraries such as numpy, pandas, scikit-learn, beautifulsoup, OSMnx; amongst many others.

Select Exercises

Flights

This exercise used a public flights dataset to showcase common pandas data-frames operations such as: loading, filtering, aggregating, performing operations on subsets, etcetera.

Housing

A classic exercise on the California Housing dataset to teach the use of scikit-learn’s pipelines for data cleaning and wrangling.

Twitter Topics Parser

One of the data-scraping exercises. We parsed Twitter for hashstags for further analysis (such as sentiment analysis).

Topics and Code

And here’s a full list of the topics and exercises covered in the bootcamp (with links to their scripts):

With the whole sitemap available in the following link.

Code repo