MACHINE LEARNING

Obiettivi formativi

This course is an introduction to machine learning with specialization in methods for financial time series. The course is divided into two parts and will be jointly taught by Professors Megha Patnaik (Part 1) and Marta Catalano (Part 2). The programming language for the course will be R.

Risultati di apprendimento attesi

In the first part, we will cover linear and polynomial regression, logistic regression and linear discriminant analysis; cross-validation and the bootstrap, subset selection and model regularization methods (ridge and lasso); tree-based methods, random forests and boosting. The focus is on the important elements of modern data analysis and its applications. The computing language is the R programming language. In the second part of the course you will learn how to include prior information into your analysis and how to quantify the uncertainty in your estimates, both for static quantities and for quantities that evolve in time. Real world applications include, e.g., forecasting the returns of a set of assets in a portfolio or predicting the growth of the Gross Domestic Product of a country. We will cover Bayesian methods for unsupervised learning and time series analysis with a particular focus on parametric density estimation, conjugate priors, and dynamic linear models, a wide class of models that includes, e.g, polynomial and cyclical trends, ARMA, and VAR models. We will describe how to make forecasting and inference on these time series through the Kalman filter and discuss their implementation using the R software.

Contenuti Del Corso

In the first part, we will cover linear and polynomial regression, logistic regression and linear discriminant analysis; cross-validation and the bootstrap, subset selection and model regularization methods (ridge and lasso); tree-based methods, random forests and boosting. The focus is on the important elements of modern data analysis and its applications. The computing language is the R programming language. In the second part of the course you will learn how to include prior information into your analysis and how to quantify the uncertainty in your estimates, both for static quantities and for quantities that evolve in time. Real world applications include, e.g., forecasting the returns of a set of assets in a portfolio or predicting the growth of the Gross Domestic Product of a country. We will cover Bayesian methods for unsupervised learning and time series analysis with a particular focus on parametric density estimation, conjugate priors, and dynamic linear models, a wide class of models that includes, e.g, polynomial and cyclical trends, ARMA, and VAR models. We will describe how to make forecasting and inference on these time series through the Kalman filter and discuss their implementation using the R software.

Testi Di Riferimento

- 2nd edition of Introduction to Statistical Learning by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani. (pdf available at https://hastie.su.domains/ISLR2/ISLRv2_website.pdf) - Dynamic Linear Models with R, P. Campagnoli, S. Petrone, G. Petris ,Springer New York, NY. (pdf available at https://www.researchgate.net/publication/226410454_Dynamic_Linear_M odels_with_R) - Bayesian Data Analysis, A. Gelman, J. Carlin, H. Stern, D. Dunson, A. Vehtari, and D. Rubin, 3rd Ed. Chapman & Hall. (pdf available at http://www.stat.columbia.edu/~gelman/book/ ) - Bayesian Forecasting and Dynamic Models, M. West, J. Harrison, 2nd Ed.

Metodologie Didattiche

The course consists of lectures complemented by exercise sessions.

Modalità di verifica dell'apprendimento

Evaluation will be based on a combination of homeworks and the final exam.

Criteri per l’assegnazione dell’elaborato finale

An interview to verify understanding and motivation.

Settimana 1

Introduction & Review/Setup for the R programming language

Settimana 2

Statistical Learning - Statistical Learning and Regression, Parametric vs. Non-Parametric Models, Model Accuracy, K-Nearest Neighbors

Settimana 3

Linear Regression - Simple Linear Regression, Hypothesis Testing, Multiple Linear Regression, Model Selection, Interactions and Non-Linear Models

Settimana 4

Classification - Logistic Regression, Multivariate Logistic Regression, Multiclass Logistic Regression, Linear Discriminant Analysis, Univariate Linear Discriminant Analysis, Multivariate Linear Discriminant Analysis, Quadratic Discriminant Analysis

Settimana 5

Cross Validation - K-Fold Cross-Validation, Bootstrap

Settimana 6

Variable Selection - Linear Model Subset Selection, Forward Stepwise Selection, Backward Stepwise Selection, Estimating Test Error - AIC, BIC, Estimating Test Error -- Cross- Validation, Ridge Regression, Lasso, Tuning Parameters, Dimension Reduction, Principal Components and Partial Least Squares

Settimana 7

Tree/Forest methods - Decision Trees, Pruning Trees, Classification Trees, Bagging, Random Forests, Boosting

Settimana 8

Introduction, probability recap, Bayesian versus frequentist statistics, choice of likelihood and prior.

Settimana 9

Bayesian inference and prediction - conjugate priors, summaries of posterior distributions, interpretation.

Settimana 10

Dynamic linear models - state-space models, filtering, smoothing, prediction, Kalman Filter.

Settimana 11

Kalman Filter, implementation through dlm r package, model checking.

Settimana 12

MLE estimation of unknown parameters, forecast function, polynomial trend models, free-form dlm, harmonic dlm, superposition of dlms.