MACHINE LEARNING
Obiettivi formativi
This course is an introduction to machine learning with specialization in methods for financial time series. The course is divided into two parts and will be jointly taught by Professors Megha Patnaik (Part 1) and Marta Catalano (Part 2). The programming language for the course will be R.
Risultati di apprendimento attesi
In the first part, we will cover linear and polynomial regression, logistic regression and linear discriminant analysis; cross-validation and the bootstrap, subset selection and model regularization methods (ridge and lasso); tree-based methods, random forests and boosting. The focus is on the important elements of modern data analysis and its applications. The computing language is the R programming language.
In the second part of the course you will learn how to include prior information into your analysis and how to quantify the uncertainty in your estimates, both for static quantities and for quantities that evolve in time. Real world applications include, e.g., forecasting the returns of a set of assets in a portfolio or predicting the growth of the Gross Domestic Product of a country. We will cover Bayesian methods for unsupervised learning and time series analysis with a particular focus on parametric density estimation, conjugate priors, and dynamic linear models, a wide class of models that includes, e.g, polynomial and cyclical trends, ARMA, and VAR models. We will describe how to make forecasting and inference on these time series through the Kalman filter and discuss their implementation using the R software.
Contenuti Del Corso
In the first part, we will cover linear and polynomial regression, logistic regression and linear discriminant analysis; cross-validation and the bootstrap, subset selection and model regularization methods (ridge and lasso); tree-based methods, random forests and boosting. The focus is on the important elements of modern data analysis and its applications. The computing language is the R programming language.
In the second part of the course you will learn how to include prior information into your analysis and how to quantify the uncertainty in your estimates, both for static quantities and for quantities that evolve in time. Real world applications include, e.g., forecasting the returns of a set of assets in a portfolio or predicting the growth of the Gross Domestic Product of a country. We will cover Bayesian methods for unsupervised learning and time series analysis with a particular focus on parametric density estimation, conjugate priors, and dynamic linear models, a wide class of models that includes, e.g, polynomial and cyclical trends, ARMA, and VAR models. We will describe how to make forecasting and inference on these time series through the Kalman filter and discuss their implementation using the R software.
Testi Di Riferimento
- 2nd edition of Introduction to Statistical Learning by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani.
(pdf available at https://hastie.su.domains/ISLR2/ISLRv2_website.pdf)
- Dynamic Linear Models with R, P. Campagnoli, S. Petrone, G. Petris ,Springer New York, NY.
(pdf available at https://www.researchgate.net/publication/226410454_Dynamic_Linear_M odels_with_R)
- Bayesian Data Analysis, A. Gelman, J. Carlin, H. Stern, D. Dunson, A. Vehtari, and D. Rubin, 3rd Ed. Chapman & Hall.
(pdf available at http://www.stat.columbia.edu/~gelman/book/ )
- Bayesian Forecasting and Dynamic Models, M. West, J. Harrison, 2nd Ed.
Metodologie Didattiche
The course consists of lectures complemented by exercise sessions.
Modalità di verifica dell'apprendimento
Evaluation will be based on a combination of homeworks and the final exam.
Criteri per l’assegnazione dell’elaborato finale
An interview to verify understanding and motivation.
Settimana 1
Introduction & Review/Setup for the R programming language
Settimana 2
Statistical Learning - Statistical Learning and Regression, Parametric vs. Non-Parametric Models, Model
Accuracy, K-Nearest Neighbors
Settimana 3
Linear Regression - Simple Linear Regression, Hypothesis Testing, Multiple Linear Regression, Model
Selection, Interactions and Non-Linear Models
Settimana 4
Classification - Logistic Regression, Multivariate Logistic Regression, Multiclass Logistic Regression,
Linear Discriminant Analysis, Univariate Linear Discriminant Analysis, Multivariate Linear Discriminant
Analysis, Quadratic Discriminant Analysis
Settimana 5
Cross Validation - K-Fold Cross-Validation, Bootstrap
Settimana 6
Variable Selection - Linear Model Subset Selection, Forward Stepwise Selection, Backward Stepwise
Selection, Estimating Test Error - AIC, BIC, Estimating Test Error -- Cross- Validation, Ridge Regression,
Lasso, Tuning Parameters, Dimension Reduction, Principal Components and Partial Least Squares
Settimana 7
Tree/Forest methods - Decision Trees, Pruning Trees, Classification Trees, Bagging, Random Forests,
Boosting
Settimana 8
Introduction, probability recap, Bayesian versus frequentist statistics, choice of likelihood and prior.
Settimana 9
Bayesian inference and prediction - conjugate priors, summaries of posterior distributions, interpretation.
Settimana 10
Dynamic linear models - state-space models, filtering, smoothing, prediction, Kalman Filter.
Settimana 11
Kalman Filter, implementation through dlm r package, model checking.
Settimana 12
MLE estimation of unknown parameters, forecast function, polynomial trend models, free-form dlm, harmonic dlm, superposition of dlms.