IPython Notebooks on Data Science for Civic Hackers

Instead of an app, my plan is to create a set of IPython notebooks on how civic hackers can do data science effectively. We are currently experiencing a surge of new data and tools that can help us derive conclusions from the data. Software packages containing methods from Statistics, Machine Learning, and Artificial Intelligence have been open-sourced and available for all to use. Like all tools, however, you have to know how to use these methods effectively. There are other great IPython notebooks out there related to statistics and machine learning, but most of them require some level of advanced mathematical training. Civic hackers who want to use data to solve social problems may be overwhelmed by the prerequisite knowledge necessary to make sense of references usually mentioned by experts and academics. There is also the danger of consulting references that are too simplistic and don’t emphasize good statistical practice. The goal is to fill in that gap; that is, provide a set of IPython notebooks that are rigorous but not mathematically overwhelming. This set of notebooks attempts to go beyond the recent "Big Data" hype and focuses on the social problems we face today. The goal is to provide a resource so that all civic hackers can learn the necessary computational and statistical skills to tackle any social issue when adequate data are available.

If you want to do data science, machine learning, statistics or data visualization for good, this project is for you!

Helpful Skills: Python Data Tools (specifically Numpy, SciPy, Scikit-Learn, Pandas, IPython, StatsModels, Matplotlib, ggplot, etc.), R, d3.js and other statistics/visualization techniques that can be implemented in an IPython notebook.

Helpful Knowledge: Statistics, Machine Learning, Data Visualization, and Ability to communicate statistical/scientific findings to a general audience.

Also, all civic hackers who want to use data visualization and statistics can look at the notebooks and provide comments and suggestions. These notebooks are meant for YOU, hence you are not only welcome, but encouraged, to provide feedback if you feel they can be improved.

Current Outline:

  1. Complete Prologue.

  2. Decide the Table of Contents.

  3. Gather helpful references and relevant datasets.

  4. Complete each chapter in a linear fashion. The order of chapters will start with motivations and basic statistics, and will advance in difficulty as the chapters progress.

Disclaimer: Although my interests are in machine learning and statistics, I do not consider myself an expert in either. This project will be a learning experience for me as much as I hope it will be for you.


Short-term goal: Prologue: Complete it.

Chapter 1: Complete Simpson’s Paradox example.

Then post prologue and Chapter 1 in IPython NbViewer.

Project Activity