MACHINE LEARNING

Megha Patnaik, Marta Catalano

Course code:

EFI07

General discipline (SSD):

SECS-P/05

Course year:

Semester:

Primo Semestre

Partition of students:

Unique

Credits:

Teaching language:

English

Total lesson hours:

Academic year:

2026/2027

Description

Extended program and reference teaching materials

Instructional goals

This course is an introduction to machine learning. The course is divided into two parts and will be jointly taught by Professors Megha Patnaik (Part 1) and Marta Catalano (Part 2). The programming language for the course will be R.

Prerequisites

The coding for the course will be in R. Prior knowledge of programming is useful but not essential. A solid background in statistics and probability is highly recommended, see e.g. Chapter 1-5 of Ross (2017).

Intended learning outcomes

At the end of the course, students are expected to be able to apply key methods from supervised and probabilistic machine learning; compare frequentist and Bayesian approaches to statistical inference; use regression, classification, resampling, regularization, and tree-based methods for prediction and model selection; incorporate prior information into statistical analyses and quantify uncertainty in estimation and forecasting; formulate and analyze dynamic linear models for time series data; perform inference and forecasting through the Kalman filter; and implement these methods in R for real-data applications.

Course Contents

In the first part, we will cover linear and polynomial regression, logistic regression and linear discriminant analysis; cross-validation and the bootstrap, subset selection and model regularization methods (ridge and lasso); tree-based methods, random forests and boosting. The focus is on the important elements of modern data analysis and its applications. The computing language is the R programming language. The second part of the course focuses on probabilistic machine learning. Students will learn how to include prior information into their analysis and how to quantify the uncertainty in estimates, both for static quantities and for quantities that evolve over time. Real world applications include, e.g., forecasting the returns of a set of assets in a portfolio or predicting the growth of the Gross Domestic Product of a country. More specifically, this part covers Bayesian methods for unsupervised learning and time series analysis with a particular focus on parametric density estimation, conjugate priors, and dynamic linear models, a wide class of models that includes, e.g, polynomial and cyclical trends as well as ARMA models. It also addresses forecasting and inference for these models through the Kalman filter and discusses their implementation in R.

Reference Books

- 2nd edition of Introduction to Statistical Learning by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani. (pdf available at https://hastie.su.domains/ISLR2/ISLRv2_website.pdf) - A First Course in Bayesian Statistical Methods by Peter D. Hoff. (pdf available at https://sites.math.rutgers.edu/~zeilberg/EM20/Hoff.pdf) - Dynamic Linear Models with R, P. Campagnoli, S. Petrone, G. Petris ,Springer New York, NY. (pdf available at https://www.researchgate.net/publication/226410454_Dynamic_Linear_M odels_with_R) - Bayesian Data Analysis, A. Gelman, J. Carlin, H. Stern, D. Dunson, A. Vehtari, and D. Rubin, 3rd Ed. Chapman & Hall. (pdf available at http://www.stat.columbia.edu/~gelman/book/ ) - Bayesian Forecasting and Dynamic Models, M. West, J. Harrison, 2nd Ed. - Ross S. M. (2017). Introductory statistics. 4th Ed. Elsevier.

Teaching Methods

The course consists of lectures complemented by exercise sessions.

Assessment Method

The final grade will be determined by two assignments, which together count for one third of the grade, and a written final examination, which counts for the remaining two thirds.

Thesis assignment criteria

An interview to verify understanding and motivation.

Week 1

Introduction & Review/Setup for the R programming language

Week 2

Statistical Learning - Statistical Learning and Regression, Parametric vs. Non-Parametric Models, Model Accuracy, K-Nearest Neighbors

Week 3

Linear Regression - Simple Linear Regression, Hypothesis Testing, Multiple Linear Regression, Model Selection, Interactions and Non-Linear Models

Week 4

Classification - Logistic Regression, Multivariate Logistic Regression, Multiclass Logistic Regression, Linear Discriminant Analysis, Univariate Linear Discriminant Analysis, Multivariate Linear Discriminant Analysis, Quadratic Discriminant Analysis

Week 5

Cross Validation - K-Fold Cross-Validation, Bootstrap

Week 6

Foundations of Bayesian Statistics - Introduction to the Bayesian framework. Review of basic probability concepts, including conditional probability, Bayes’ theorem, random variables, and common probability distributions. Comparison between the Bayesian and frequentist approaches to statistical inference. Choice of likelihood and prior distributions.

Week 7

Bayesian Inference and Prediction - Posterior inference and predictive distributions. Conjugate priors and their role in analytical Bayesian updating. Numerical and graphical summaries of posterior distributions. Interpretation of posterior quantities, credible intervals, and Bayesian prediction.

Week 8

Posterior Approximation Methods - When closed-form posterior inference is not available: approximation techniques for posterior computation. Monte Carlo methods, importance sampling, and Gibbs sampling. Basic principles, implementation, and comparison of these methods.

Week 9

Bayesian Linear Regression - Linear regression in the Bayesian framework. Prior specification for regression coefficients and variance parameters. Posterior distribution, posterior prediction, and comparison with classical linear regression.

Week 10

Dynamic Linear Models and State-Space Models - Introduction to dynamic linear models as Bayesian state-space models. Components of the observation and state equations. Filtering, smoothing, and prediction in time series analysis.

Week 11

Kalman Filter - The Kalman Filter for Gaussian dynamic linear models. Recursive estimation, forecasting, and state updating. Practical interpretation and implementation using the dlm package in R, including filtering, smoothing, forecasting, and model checking.

Week 12

Parameter Estimation and Advanced Dynamic Linear Models - Maximum likelihood estimation of unknown parameters in dynamic linear models. Use of forecast functions. Polynomial trend models, free-form dynamic linear models, harmonic dynamic linear models, and superposition of dynamic linear models.