ADVANCED STATISTICS

ADVANCED STATISTICS

Marta Catalano

Obiettivi formativi

The course provides an overview of some advanced statistical methods for data science. The focus is on understanding advantages and limitations of each approach, interpretation, and main applications in various disciplines, particularly in economics, business, and management. Students will learn how to solve several supervised and unsupervised tasks, including regression, classification, clustering, and Bayesian inference.

Risultati di apprendimento attesi

Knowledge and understanding: The course will offer key statistical tools to investigate interrelationships between predictors and a continuous or categorical outcome. It will also discuss clustering and principles of Bayesian inference. Strengths, weaknesses, use cases, and interpretation of the results of each method will be discussed in depth. Applying knowledge and understanding: On successful completion of this course students will be able to: - Appreciate the different statistical methods for prediction of continuous and categorical outcomes. - Select, implement, and interpret the most appropriate statistical predictive tools in a range of real-world applications. - Cluster observations according to similar patterns and summarize multivariate data sets for information retrieval. Making judgments: Students are expected to be able to choose the appropriate statistical method to pursue their aims with data analysis, taking into considerations data limitations and comparative performance. Students will demonstrate fluency with the software and interpretation of the results. Throughout the entire course, students will be stimulated to consider strengths and weaknesses of the different methods discussed in class. Communications Skills: This course will give the students the possibility to acquire the lexicon of statistical models and multivariate analysis. They will learn how to effectively communicate the results of their data analyses. A special emphasis will be given to writing concise and clear reports through the project work. Learning skills: This course will empower students with the capability to analyzing data for real-world problems in an independent and critical way.

Contenuti Del Corso

The course will cover the following topics: - Basics of probability - Linear regression - Logistic regression - Crossvalidation - Clustering - Bayesian inference

Testi Di Riferimento

Witten J.D., Hastie T. & Tibshirani R. (2014). An Introduction to Statistical Learning with Applications in R. Springer. (main reference) Hastie T., Tibshirani R., Friedman J. (2009). The Elements of Statistical Learning, 2nd Ed. Springer. Bishop C. (2006). Pattern Recognition and Machine Learning. Springer. Gelman A., Carlin J., Stern H., Dunson D., Vehtari A., Rubin D. (2013). Bayesian Data Analysis. 3rd Ed. Chapman & Hall. Ross S. M. (2017). Introductory statistics. 4th Ed. Elsevier.

Metodologie Didattiche

The course consists of lectures complemented by practical lab sessions and group project work.

Modalità di verifica dell'apprendimento

Attending students will be evaluated on both a written part (70%) and a group project (30%). Details will be provided before the beginning of the course.

Criteri per l’assegnazione dell’elaborato finale

An interview to verify understanding and motivation.

Settimana 1

Introduction.

Settimana 2

Probability and Statistics recap.

Settimana 3

Regression. RLab.

Settimana 4

Linear regression.

Settimana 5

Linear regression. RLab.

Settimana 6

Multivariate regression.

Settimana 7

Classification. RLab.

Settimana 8

Logistic regression.

Settimana 9

Crossvalidation. RLab.

Settimana 10

Clustering.

Settimana 11

Bayesian inference. RLab.

Settimana 12

Bayesian inference.