Interactive Data Science

The goal of this course is to provide you with the tools to build data-driven interactive systems and explore the new opportunities enabled by this data through a combination of guest lectures, discussion of current literature, and practical skills development. Over the course of the semester, you will learn about data science and the entire data pipeline from collecting and analyzing to interacting with data.

This course requires comfort with programming, as required projects make use of (at a minimum) python, sql, css, and javascript (including D3). A series of "project bytes" help to lay the groundwork for a final larger group project.

The learning goals of the course are as follows:

  • To introduce basic concepts in data collection including data formats, parsing and sources of data
  • To introduce common problems with data such as structural problems, outliers, incomplete data, and dirty data
  • To introduce basic concepts in data interpretation including feature generation, statistical analysis and classification
  • To introduce concepts in data visualization including what makes a good visualization and the use of interaction in visualization
  • To provide practical applied examples of the data pipeline through an examination of current literature
  • To provide hands on experience with creating data driven applications and a produce a portfolio of such applications

Prerequisites:

The class will involve programming and debugging. If required by your background, it is possible to minimize the programming you do for projects (in which case you will be expected to spend more time on other factors such as beautiful visual designs). However, you should not take the course if you find programming or debugging extremely difficult because you will have to master several very different programming languages/concepts in very short order (projects make use use of web programming frameworks including Flask, Bootstrap, Ajax, jQuery, D3, Google Appspot; and multiple languages including Python, Javascript and SQL). That being said, the assignments that require these will have useful resources for brushing up on the topics.

Projects:

The course is project oriented. It includes a large final self-defined project along with 5 smaller "project bytes" designed to provide the stepping stones needed to complete the final project. Tentative due dates for these projects can be found at the bottom of this syllabus under the 'Course Summary' heading. Your work will be evaluated relative to your background and level of effort. This is a graduate class, and the assumption is that you are a mature and motivated student, and that you will define your work so that you learn and grow, given your background. Students who are taking this course as a part of a technical requirement (such as the computer science course requirement in the HCI PhD) will need to do more advanced or ambitious projects, and should consult with the instructor to make sure they are meeting this bar.

All bytes are to be done as individual work. It is expected that students may assist each other with conceptual issues, but not provide code. If you use example code, you must explicitly acknowledge this in your assignment submission. If you are unsure about these boundaries, ask. The larger project is to be done in groups of two or larger. 

Some of the specific skills that will be covered in projects include:

  • Display data from an API (such as the twitter API) on a website you create
  • Create a mashup of data from multiple web APIs
  • Create an interactive visualization of a data set
  • Answer a series of intriguing questions from both the data and corresponding visualizations

Late projects will be penalized 20% per day.

Quizzes:

There will be a few in-class quizzes covering the lecture materials and readings. There is no midterm or final.

Course Materials:

Readings will be made available on this CMU Canvas site. The following books are recommended:

Interactive Data Visualization for the Web (Free online version

Doing Data Science (Schutt & O'Neil) 

These books may also be useful:

Visualize This (Nathan Yau) (uses R and Python)

Programming Google App Engine, Charles Severance (uses Python, plus add-ons like JavaScript)

Python for Data Analysis, Wes McKinney (Python) 

List of Topics Covered:

Week Date Category Topic
1

8/28/2018

Course Introduction Data Pipeline
1

8/30/2018

Exploratory Data Analysis Asking a Question
2

9/4/2018

Exploratory Data Analysis Data Quality
2

9/6/2018

Exploratory Data Analysis Data Sampling
3

9/11/2018

Exploratory Data Analysis Database Storage
3

9/13/2018

Exploratory Data Analysis Database Retrieval 
4

9/18/2018

Exploratory Data Analysis Ethics of Exploring Data
4

9/20/2018

Human Centered Methods Ubiquitous Self 
5

9/25/2018

Human Centered Methods Ubiquitous Self
5

9/27/2018

Intro to Visualization Overview of field
6

10/2/2018

Intro to Visualization Intro to Tableau
6

10/4/2018

Human Centered Methods Ubiquitous Self
7

10/9/2018

Exploratory Data Analysis Statistical Methods
7

10/11/2018

Machine Learning Overview
8

10/16/2018

Machine Learning Ethics
8

10/18/2018

Human Centered Methods Crowd Sourcing
9

10/23/2018

Human Centered Methods Ethics of HC Data
9

10/25/2018

Human Centered Methods Crowd Sourcing
10

10/30/2018

Research Methods Controlled Experiments
10

11/1/2018

Research Methods Data from Experiments
11

11/6/2018

Advanced Visualization Perception, Cognition, and Color
11

11/8/2018

Advanced Visualization

Multi-dimensional Data Visualization

12

11/13/2018

Advanced Visualization

Design Guidelines

12

11/15/2018

Advanced Visualization Interaction
13

11/20/2018

Advanced Visualization

Ethics of Visualization

13

11/22/2018

No Class - Holiday
14

11/27/2018

Advanced Visualization

Storytelling with Data

14

11/29/2018

Advanced Visualization

Advanced Vis Topic TBD

15

12/4/2018

Final Presentation
15

12/6/2018

Final Presentation

Concepts

  • Structured vs unstructured data
  • Dealing with heterogeneous data
  • Sampling and Bias in Data Collection
  • Sensed Data
  • Mobile Data
  • Data transformation and analysis
  • Information Visualization
  • Current research in information driven interfaces

Skills

  • Getting Web data
  • Dealing with APIs and Oauth
  • Getting access to mobile data
  • Common data formats
  • Data parsing
  • Common problems with data
  • Tools for analyzing data
  • Tools for visualizing data

Readings and Discussion:

You will be expected to read assigned readings before the lecture they pertain to. These may include chapters drawn from textbooks about data, or readings about the research literature. To incentive this, each student will be required to make at least two relevant postings to the discussion group before the class on which each reading is due.

Grades:

The tentative breakdown for grading is below. As a reminder, here is the university policy on academic integrity.

60% Project Bytes 
30% Final Project
10% Quizzes

Course Summary:

Date Details Due
CC Attribution Non-Commercial Share Alike This course content is offered under a CC Attribution Non-Commercial Share Alike license. Content in this course can be considered under this license unless otherwise noted.