A 3 day data-wrangling bootcamp that I designed for professors in my alma mater, Tecnológico de Monterrey.

Introduction

After the initial DataViz CADi I was hired to teach a data wrangling course using Python in a 3-day bootcamp. This time, instead of using Mathematica and R, we’d focus on the use of Python for common data-processing operations.

Objective

This course was designed to serve as an introduction to developing data-wrangling routines in Python. The bootcamp covered tools used in best-practices like virtual environments, and github; and also included exercises in the use of a wide array of libraries such as numpy, pandas, scikit-learn, beautifulsoup, OSMnx; amongst many others.

Select Exercises

Flights

This exercise used a public flights dataset to showcase common pandas data-frames operations such as: loading, filtering, aggregating, performing operations on subsets, etcetera.

Housing

A classic exercise on the California Housing dataset to teach the use of scikit-learn’s pipelines for data cleaning and wrangling.

Twitter Topics Parser

One of the data-scraping exercises. We parsed Twitter for hashstags for further analysis (such as sentiment analysis).

Topics and Code

And here’s a full list of the topics and exercises covered in the bootcamp (with links to their scripts):

Intro and Basics
- Python Basics: python 101.a, python 101.b, python 101.c, intermediate python and file formats, advanced python
- Libraries: pypi, pkg
- Environments: anaconda, virtualenv
- Github: creating a repo, forking a repo
- Python IDEs: IDLE, spyder, jupyter, nteract, atom
Data Wrangling
- pandas: NFL, zoo, articles read, titanic,flights, baseball, baseball efficient
- scipy: word ladder
- numpy: Comoros, pgSIT
- scikit-learn: housing cleaning, housing exploring, housing transforming
Data Sources:
Data Thoughts: dataViz, storytelling

With the whole sitemap available in the following link.

Code repo

Repository: Github repo with all the materials and exercises

PREVIOUSDataViz CADi Lectures

NEXTMK8D: Speedrun Analysis