ADVANCED STATISTICS
Instructional goals
The course provides an overview of some advanced statistical methods for data science. The focus is on understanding advantages and limitations of each approach, interpretation, and main applications in various disciplines, particularly in economics, business, and management. Students will learn how to solve several supervised and unsupervised tasks, including regression, classification, clustering, and dimensionality reduction.
Prerequisites
Basic knowledge of descriptive statistics and statistical inference, including hypothesis testing and confidence interval estimation. Basic knowledge of the main probability distributions. Working knowledge of R would be welcome but not mandatory.
Course Contents
The course will cover the following topics:
- Linear regression
- Logistic regression
- Discriminant analysis
- Resampling methods
- Principal component analysis
- Clustering
Reference Books
Witten J.D., Hastie T. & Tibshirani R. (2014). An Introduction to Statistical Learning with Applications in R. Springer.
Hastie T., Tibshirani R., Friedman J. (2009). The Elements of Statistical Learning, 2nd Ed. Springer.
Wasserman, L., (2003). All of Statistics. Springer.
Bishop C. (2006). Pattern Recognition and Machine Learning. Springer.
Murphy K.P. (2012). Machine Learning: a Probabilistic Perspective. MIT Press.
Teaching Methods
The course consists of lectures complemented by practical lab sessions and group project work.
Assessment Method
There will be a written midterm exam (30%), a written final exam (30%), and a project (40%).
In the midterm and final exams students are required to demonstrate that they have acquired a deep understanding of the topics of the course, which enables them to reproduce the concepts seen in class and to generalize them to similar but different frameworks.
Midterm and final will count for 30% of the grade each. Students that will not take the midterm and final during the course are required to take an oral exam after the course, where they are required to demonstrate the same skills described above.
In the project students are required to demonstrate that
- they are able to design innovative solutions to concrete data-driven problems using statistical models
- they are able to analyze and assess critically strengths and weaknesses of different statistical models
- they can communicate effectively their ideas, findings, proposals, analysis and critical reasoning.
The project will count for 40% of the grade.
Thesis assignment criteria
An interview to verify understanding and motivation.
Week 1 Contenuto sessioni on line e on campus
Introduction (on campus & online)
Week 2 Contenuto sessioni on line e on campus
Probability and Statistics recap (on campus & online)
Week 3 Contenuto sessioni on line e on campus
Linear regression (on campus)
Lab session (online)
Week 4 Contenuto sessioni on line e on campus
Linear regression (on campus)
Lab session (online)
Week 5 Contenuto sessioni on line e on campus
Logistic regression (on campus)
Lab session (online)
Week 6 Contenuto sessioni on line e on campus
Test 1 (on campus)
Discriminant analysis (online)
Week 7 Contenuto sessioni on line e on campus
Resampling methods (on campus)
Lab session (online)
Week 8 Contenuto sessioni on line e on campus
Principal component analysis (on campus)
Lab session (online)
Week 9 Contenuto sessioni on line e on campus
Clustering (on campus)
Lab session (online)
Week 10 Contenuto sessioni on line e on campus
Clustering (on campus)
Lab session (online)
Week 11 Contenuto sessioni on line e on campus
Bayesian statistics (on campus)
Lab session (online)
Week 12 Contenuto sessioni on line e on campus
Test 2 (on campus)
Lab session (online)