MACHINE LEARNING
Obiettivi formativi
This course is an introduction to machine learning with specialization in methods for financial time series. The course is divided into two parts and will be jointly taught by Professors Megha Patnaik (Part 1) and Marta Catalano (Part 2). The programming language for the course will be R.
Risultati di apprendimento attesi
.
Contenuti Del Corso
In the first part, we will cover linear and polynomial regression, logistic regression and linear discriminant analysis; cross-validation and the bootstrap, subset selection and model regularization methods (ridge and lasso); tree-based methods, random forests and boosting. The focus is on the important elements of modern data analysis and its applications. The computing language is the R programming language.
In the second part of the course you will learn how to include prior information into your analysis and how to quantify the uncertainty in your estimates, both for static quantities and for quantities that evolve in time. Real world applications include, e.g., forecasting the returns of a set of assets in a portfolio or predicting the growth of the Gross Domestic Product of a country. We will cover Bayesian methods for unsupervised learning and time series analysis with a particular focus on parametric density estimation, conjugate priors, and dynamic linear models, a wide class of models that includes, e.g, polynomial and cyclical trends, ARMA, and VAR models. We will describe how to make forecasting and inference on these time series through the Kalman filter and discuss their implementation using the R software.
Testi Di Riferimento
- 2nd edition of Introduction to Statistical Learning by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani.
(pdf available at https://hastie.su.domains/ISLR2/ISLRv2_website.pdf)
- Dynamic Linear Models with R, P. Campagnoli, S. Petrone, G. Petris , Springer New York, NY.
(pdf available at https://www.researchgate.net/publication/226410454_Dynamic_Linear_Models_with_R)
- Bayesian Data Analysis, A. Gelman, J. Carlin, H. Stern, D. Dunson, A. Vehtari, and D. Rubin, 3rd Ed. Chapman & Hall.
(pdf available at http://www.stat.columbia.edu/~gelman/book/ )
- Bayesian Forecasting and Dynamic Models, M. West, J. Harrison, 2nd Ed. Springer New York, NY.
Metodologie Didattiche
.
Modalità di verifica dell'apprendimento
Evaluation will be based on a combination of homeworks and the final exam.
Criteri per l’assegnazione dell’elaborato finale
.
Settimana 1
Introduction & Review/Setup for the R programming language
Settimana 2
Statistical Learning - Statistical Learning and Regression, Parametric vs. Non-Parametric Models, Model
Accuracy, K-Nearest Neighbors
Settimana 3
Linear Regression - Simple Linear Regression, Hypothesis Testing, Multiple Linear Regression, Model
Selection, Interactions and Non-Linear Models
Settimana 4
Classification - Logistic Regression, Multivariate Logistic Regression, Multiclass Logistic Regression,
Linear Discriminant Analysis, Univariate Linear Discriminant Analysis, Multivariate Linear Discriminant
Analysis, Quadratic Discriminant Analysis
Settimana 5
Cross Validation - K-Fold Cross-Validation, Bootstrap
Settimana 6
Variable Selection - Linear Model Subset Selection, Forward Stepwise Selection, Backward Stepwise
Selection, Estimating Test Error - AIC, BIC, Estimating Test Error -- Cross-Validation, Ridge Regression,
Lasso, Tuning Parameters, Dimension Reduction, Principal Components and Partial Least Squares
Settimana 7
Tree/Forest methods - Decision Trees, Pruning Trees, Classification Trees, Bagging, Random Forests,
Boosting
Settimana 8
Parametric density estimation - introduction & recap, maximum likelihood estimation, Bayes theorem.
Settimana 9
Bayesian inference and prediction - conjugate priors, summaries of posterior distributions, interpretation.
Settimana 10
Dynamic linear models - state-space models, filtering, smoothing, prediction, Kalman Filter.
Settimana 11
Implementation - basic building blocks, dlm r package, model checking
Settimana 12
.