ADVANCED STATISTICS

ADVANCED STATISTICS

Marta Catalano

Instructional goals

The course provides an overview of some advanced statistical methods for data science. The focus is on understanding advantages and limitations of each approach, interpretation, and main applications in various disciplines, particularly in economics, business, and management. Students will learn how to solve several supervised and unsupervised tasks, including regression, classification, clustering, and Bayesian inference.

Intended learning outcomes

Knowledge and understanding: The course will offer key statistical tools to investigate interrelationships between predictors and a continuous or categorical outcome. It will also discuss clustering and principles of Bayesian inference. Strengths, weaknesses, use cases, and interpretation of the results of each method will be discussed in depth. Applying knowledge and understanding: On successful completion of this course students will be able to: - Appreciate the different statistical methods for prediction of continuous and categorical outcomes. - Select, implement, and interpret the most appropriate statistical predictive tools in a range of real-world applications. - Cluster observations according to similar patterns and summarize multivariate data sets for information retrieval. Making judgments: Students are expected to be able to choose the appropriate statistical method to pursue their aims with data analysis, taking into considerations data limitations and comparative performance. Students will demonstrate fluency with the software and interpretation of the results. Throughout the entire course, students will be stimulated to consider strengths and weaknesses of the different methods discussed in class. Communications Skills: This course will give the students the possibility to acquire the lexicon of statistical models and multivariate analysis. They will learn how to effectively communicate the results of their data analyses. A special emphasis will be given to writing concise and clear reports through the project work. Learning skills: This course will empower students with the capability to analyzing data for real-world problems in an independent and critical way.

Course Contents

The course will cover the following topics: - Basics of probability - Linear regression - Logistic regression - Discriminant analysis - Crossvalidation - Clustering - Bayesian inference

Reference Books

Witten J.D., Hastie T. & Tibshirani R. (2014). An Introduction to Statistical Learning with Applications in R. Springer. (main reference) Hastie T., Tibshirani R., Friedman J. (2009). The Elements of Statistical Learning, 2nd Ed. Springer. Bishop C. (2006). Pattern Recognition and Machine Learning. Springer. Gelman A., Carlin J., Stern H., Dunson D., Vehtari A., Rubin D. (2013). Bayesian Data Analysis. 3rd Ed. Chapman & Hall. Ross S. M. (2017). Introductory statistics. 4th Ed. Elsevier.

Teaching Methods

The course consists of lectures complemented by practical lab sessions and group project work.

Assessment Method

There will be a written midterm test I (30%), a written test II (30%), and a group project (40%). In test I and test II students are required to demonstrate that they have acquired a deep understanding of the topics of the course, which enables them to reproduce the concepts seen in class and to generalize them to similar but different frameworks. Each test will count for 30% of the grade. Students that will not take test I and test II during the course are required to take an oral exam after the course, where they are required to demonstrate the same skills described above. In the project students are required to demonstrate that - they are able to design innovative solutions to concrete data-driven problems using statistical models - they are able to analyze and assess critically strengths and weaknesses of different statistical models - they can communicate effectively their ideas, findings, proposals, analysis and critical reasoning. The project will count for 40% of the grade.

Thesis assignment criteria

An interview to verify understanding and motivation.

Week 1

Introduction.

Week 2

Probability and Statistics recap.

Week 3

Linear regression. RLab.

Week 4

Linear regression. RLab.

Week 5

Linear regression. RLab.

Week 6

Test I. RLab.

Week 7

Logistic regression. RLab.

Week 8

Latent discriminant analysis. RLab.

Week 9

Crossvalidation. RLab.

Week 10

Clustering. RLab.

Week 11

Test II. RLab.

Week 12

Bayesian inference. RLab.