STATISTICAL FOUNDATIONS OF DATA SCIENCE

STATISTICAL FOUNDATIONS OF DATA SCIENCE

Mariaelena Bottazzi Schenone, Marta Catalano

Instructional goals

The course provides an overview of some advanced statistical methods for data science. The focus is on understanding advantages and limitations of each approach, interpretation, and main applications in various disciplines, particularly in economics, business, and management. Students will learn how to solve several supervised and unsupervised learning tasks, including regression, classification, clustering, and Bayesian inference.

Prerequisites

Solid knowledge of basic probability, descriptive statistics and statistical inference, including hypothesis testing and confidence intervals; see, for example, Chapters 1–8 of Ross (2017). Working knowledge of R is welcome but not mandatory. Luiss Preliminary Courses for Master's degree - Recommended: Statistics, Probability Suggested: R, Mathematics

Course Contents

• Introduction • Probability and Statistics recap • Principles of regression • Simple and multivariate linear regression • Cross-validation • Principles of classification • Logistic regression • Clustering • Principles of Bayesian inference • Bayesian linear regression

Reference Books

• James, G., Witten D., Hastie T. & Tibshirani R. (2021). An Introduction to Statistical Learning: With Applications in R. 2nd Ed. Springer. [main] • Bishop C. (2006). Pattern Recognition and Machine Learning. Springer. • Gelman A., Carlin J., Stern H., Dunson D., Vehtari A., Rubin D. (2013). Bayesian Data Analysis. 3rd Ed. Chapman & Hall. • Hastie T., Tibshirani R., Friedman J. (2009). The Elements of Statistical Learning. 2nd Ed. Springer. • Hoff P. (2009). A First Course in Bayesian Statistical Methods. Springer. • Ross S. M. (2017). Introductory statistics. 4th Ed. Elsevier.

Teaching Methods

Lectures and Lab sessions.

Assessment Method

Assignment (1/3) Written final exam (2/3).

Thesis assignment criteria

An interview to verify understanding and motivation.

Week 1

Introduction.

Week 2

Probability and Statistics recap.

Week 3

Regression.

Week 4

Linear regression.

Week 5

Multivariate regression.

Week 6

Crossvalidation.

Week 7

Classification.

Week 8

Logistic regression.

Week 9

Clustering.

Week 10

Bayesian inference.

Week 11

Bayesian inference.

Week 12

Bayesian linear regression.