BST 260

Introduction to Data Science

This class focuses on methods for learning from data, in order to gain useful predictions and insights. Separating signal from noise presents many computational and inferential challenges, which we approach from a perspective at the interface of computer science and statistics. Through real-world examples of wide interest, we introduce methods for five key facets of an investigation:

1) data munging/scraping/sampling/cleaning in order to construct an informative, manageable data set;

2) software engineering skills for accessing data as well as organizing data analyses and making these analyses sharable and reproducible and

3) exploratory data analysis to generate hypotheses and intuition about the data;

4) inference and prediction based on statistical tools such as modeling,regression, and classification;

5) communication of results through visualization, stories, and interpretable summaries.