Interactive Data Science
The goal of this course is to provide you with the tools to build data-driven interactive systems and explore the new opportunities enabled by this data through a combination of guest lectures, discussion of current literature, and practical skills development. Over the course of the semester, you will learn about data science and the entire data pipeline from collecting and analyzing to interacting with data.
This course requires comfort with programming, as required projects make use of (at a minimum) python, sql, css, and javascript (including D3). A series of "project bytes" help to lay the groundwork for a final larger group project.
The learning goals of the course are as follows:
- To introduce basic concepts in data collection including data formats, parsing and sources of data
- To introduce common problems with data such as structural problems, outliers, incomplete data, and dirty data
- To introduce basic concepts in data interpretation including feature generation, statistical analysis and classification
- To introduce concepts in data visualization including what makes a good visualization and the use of interaction in visualization
- To provide practical applied examples of the data pipeline through an examination of current literature
- To provide hands on experience with creating data driven applications and a produce a portfolio of such applications
Prerequisites:
The class will involve programming and debugging. If required by your background, it is possible to minimize the programming you do for projects (in which case you will be expected to spend more time on other factors such as beautiful visual designs). However, you should not take the course if you find programming or debugging extremely difficult because you will have to master several very different programming languages/concepts in very short order (projects make use use of web programming frameworks including Flask, Bootstrap, Ajax, jQuery, D3, Google Appspot; and multiple languages including Python, Javascript and SQL). That being said, the assignments that require these will have useful resources for brushing up on the topics.
Projects:
The course is project oriented. It includes a large final self-defined project along with 5 smaller "project bytes" designed to provide the stepping stones needed to complete the final project. Tentative due dates for these projects can be found at the bottom of this syllabus under the 'Course Summary' heading. Your work will be evaluated relative to your background and level of effort. This is a graduate class, and the assumption is that you are a mature and motivated student, and that you will define your work so that you learn and grow, given your background. Students who are taking this course as a part of a technical requirement (such as the computer science course requirement in the HCI PhD) will need to do more advanced or ambitious projects, and should consult with the instructor to make sure they are meeting this bar.
All bytes are to be done as individual work. It is expected that students may assist each other with conceptual issues, but not provide code. If you use example code, you must explicitly acknowledge this in your assignment submission. If you are unsure about these boundaries, ask. The larger project is to be done in groups of two or larger.
Some of the specific skills that will be covered in projects include:
- Display data from an API (such as the twitter API) on a website you create
- Create a mashup of data from multiple web APIs
- Create an interactive visualization of a data set
- Answer a series of intriguing questions from both the data and corresponding visualizations
Late projects will be penalized 20% per day.
Quizzes:
There will be a few in-class quizzes covering the lecture materials and readings. There is no midterm or final.
Course Materials:
Readings will be made available on this CMU Canvas site. The following books are recommended:
Interactive Data Visualization for the Web (Free online version)
Doing Data Science (Schutt & O'Neil)
These books may also be useful:
Visualize This (Nathan Yau) (uses R and Python)
Programming Google App Engine, Charles Severance (uses Python, plus add-ons like JavaScript)
Python for Data Analysis, Wes McKinney (Python)
List of Topics Covered:
Week | Date | Category | Topic |
1 |
8/28/2018 |
Course Introduction | Data Pipeline |
1 |
8/30/2018 |
Exploratory Data Analysis | Asking a Question |
2 |
9/4/2018 |
Exploratory Data Analysis | Data Quality |
2 |
9/6/2018 |
Exploratory Data Analysis | Data Sampling |
3 |
9/11/2018 |
Exploratory Data Analysis | Database Storage |
3 |
9/13/2018 |
Exploratory Data Analysis | Database Retrieval |
4 |
9/18/2018 |
Exploratory Data Analysis | Ethics of Exploring Data |
4 |
9/20/2018 |
Human Centered Methods | Ubiquitous Self |
5 |
9/25/2018 |
Human Centered Methods | Ubiquitous Self |
5 |
9/27/2018 |
Intro to Visualization | Overview of field |
6 |
10/2/2018 |
Intro to Visualization | Intro to Tableau |
6 |
10/4/2018 |
Human Centered Methods | Ubiquitous Self |
7 |
10/9/2018 |
Exploratory Data Analysis | Statistical Methods |
7 |
10/11/2018 |
Machine Learning | Overview |
8 |
10/16/2018 |
Machine Learning | Ethics |
8 |
10/18/2018 |
Human Centered Methods | Crowd Sourcing |
9 |
10/23/2018 |
Human Centered Methods | Ethics of HC Data |
9 |
10/25/2018 |
Human Centered Methods | Crowd Sourcing |
10 |
10/30/2018 |
Research Methods | Controlled Experiments |
10 |
11/1/2018 |
Research Methods | Data from Experiments |
11 |
11/6/2018 |
Advanced Visualization | Perception, Cognition, and Color |
11 |
11/8/2018 |
Advanced Visualization |
Multi-dimensional Data Visualization |
12 |
11/13/2018 |
Advanced Visualization |
Design Guidelines |
12 |
11/15/2018 |
Advanced Visualization | Interaction |
13 |
11/20/2018 |
Advanced Visualization |
Ethics of Visualization |
13 |
11/22/2018 |
No Class - Holiday | |
14 |
11/27/2018 |
Advanced Visualization |
Storytelling with Data |
14 |
11/29/2018 |
Advanced Visualization |
Advanced Vis Topic TBD |
15 |
12/4/2018 |
Final Presentation | |
15 |
12/6/2018 |
Final Presentation |
Concepts
- Structured vs unstructured data
- Dealing with heterogeneous data
- Sampling and Bias in Data Collection
- Sensed Data
- Mobile Data
- Data transformation and analysis
- Information Visualization
- Current research in information driven interfaces
Skills
- Getting Web data
- Dealing with APIs and Oauth
- Getting access to mobile data
- Common data formats
- Data parsing
- Common problems with data
- Tools for analyzing data
- Tools for visualizing data
Readings and Discussion:
You will be expected to read assigned readings before the lecture they pertain to. These may include chapters drawn from textbooks about data, or readings about the research literature. To incentive this, each student will be required to make at least two relevant postings to the discussion group before the class on which each reading is due.
Grades:
The tentative breakdown for grading is below. As a reminder, here is the university policy on academic integrity.
60% | Project Bytes |
30% | Final Project |
10% | Quizzes |
Course Summary:
Date | Details | Due |
---|---|---|