MACHINE LEARNING

Instructional goals

This course is an introduction to machine learning with specialization in methods for financial time series. The course is divided into two parts and will be jointly taught by Professors Megha Patnaik (Part 1) and Marta Catalano (Part 2). The programming language for the course will be R.

Intended learning outcomes

.

Course Contents

In the first part, we will cover linear and polynomial regression, logistic regression and linear discriminant analysis; cross-validation and the bootstrap, subset selection and model regularization methods (ridge and lasso); tree-based methods, random forests and boosting. The focus is on the important elements of modern data analysis and its applications. The computing language is the R programming language. In the second part of the course you will learn how to include prior information into your analysis and how to quantify the uncertainty in your estimates, both for static quantities and for quantities that evolve in time. Real world applications include, e.g., forecasting the returns of a set of assets in a portfolio or predicting the growth of the Gross Domestic Product of a country. We will cover Bayesian methods for unsupervised learning and time series analysis with a particular focus on parametric density estimation, conjugate priors, and dynamic linear models, a wide class of models that includes, e.g, polynomial and cyclical trends, ARMA, and VAR models. We will describe how to make forecasting and inference on these time series through the Kalman filter and discuss their implementation using the R software.

Reference Books

- 2nd edition of Introduction to Statistical Learning by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani. (pdf available at https://hastie.su.domains/ISLR2/ISLRv2_website.pdf) - Dynamic Linear Models with R, P. Campagnoli, S. Petrone, G. Petris , Springer New York, NY. (pdf available at https://www.researchgate.net/publication/226410454_Dynamic_Linear_Models_with_R) - Bayesian Data Analysis, A. Gelman, J. Carlin, H. Stern, D. Dunson, A. Vehtari, and D. Rubin, 3rd Ed. Chapman & Hall. (pdf available at http://www.stat.columbia.edu/~gelman/book/ ) - Bayesian Forecasting and Dynamic Models, M. West, J. Harrison, 2nd Ed. Springer New York, NY.

Teaching Methods

.

Assessment Method

Evaluation will be based on a combination of homeworks and the final exam.

Thesis assignment criteria

.

Week 1 Contenuto sessioni on line e on campus

Introduction & Review/Setup for the R programming language

Week 2 Contenuto sessioni on line e on campus

Statistical Learning - Statistical Learning and Regression, Parametric vs. Non-Parametric Models, Model Accuracy, K-Nearest Neighbors

Week 3 Contenuto sessioni on line e on campus

Linear Regression - Simple Linear Regression, Hypothesis Testing, Multiple Linear Regression, Model Selection, Interactions and Non-Linear Models

Week 4 Contenuto sessioni on line e on campus

Classification - Logistic Regression, Multivariate Logistic Regression, Multiclass Logistic Regression, Linear Discriminant Analysis, Univariate Linear Discriminant Analysis, Multivariate Linear Discriminant Analysis, Quadratic Discriminant Analysis

Week 5 Contenuto sessioni on line e on campus

Cross Validation - K-Fold Cross-Validation, Bootstrap

Week 6 Contenuto sessioni on line e on campus

Variable Selection - Linear Model Subset Selection, Forward Stepwise Selection, Backward Stepwise Selection, Estimating Test Error - AIC, BIC, Estimating Test Error -- Cross-Validation, Ridge Regression, Lasso, Tuning Parameters, Dimension Reduction, Principal Components and Partial Least Squares

Week 7 Contenuto sessioni on line e on campus

Tree/Forest methods - Decision Trees, Pruning Trees, Classification Trees, Bagging, Random Forests, Boosting

Week 8 Contenuto sessioni on line e on campus

Parametric density estimation - introduction & recap, maximum likelihood estimation, Bayes theorem.

Week 9 Contenuto sessioni on line e on campus

Bayesian inference and prediction - conjugate priors, summaries of posterior distributions, interpretation.

Week 10 Contenuto sessioni on line e on campus

Dynamic linear models - state-space models, filtering, smoothing, prediction, Kalman Filter.

Week 11 Contenuto sessioni on line e on campus

Implementation - basic building blocks, dlm r package, model checking

Week 12 Contenuto sessioni on line e on campus

.