ADVANCED STATISTICS

ADVANCED STATISTICS

Alessio Farcomeni

Instructional goals

The course provides an overview of some advanced statistical methods for data science. The focus is on understanding advantages and limitations of each approach, interpretation, and main applications in various disciplines, particularly in economics, business, and management. Main focus is on modeling continuous and categorical outcomes. Students in this course will not only gain a deep understanding of asymmetric modeling and dimension reduction for data science, but they will also acquire the practical skills necessary for their successful applications to problems in science and industry.

Intended learning outcomes

Knowledge and understanding: The course will offer key statistical tools to investigate interrelationships between predictors and a continuous or categorical outcome. It will also discuss dimension reduction techniques, and their use for visualization, scoring, and information retrieval in multivariate data settings. Strengths, weaknesses, use cases, and interpretation of the results of each method will be discussed in depth. Applying knowledge and understanding: On successful completion of this course students will be able to: • Appreciate the different statistical methods for prediction of continuous and categorical outcomes • Select, implement, and interpret the most appropriate statistical predictive tools in a range of real-world applications. • Appreciate the perspectives offered by different conditional models • Summarize multivariate data sets for information retrieval Making judgments: Students are expected to be able to choose the appropriate statistical method to pursue their aims with data analysis, taking into considerations data limitations and comparative performance. Students will demonstrate fluency with the software and interpretation of the results. Throughout the entire course, students will be stimulated to consider strengths and weaknesses of the different methods discussed in class. Communications Skills: This course will give the students the possibility to acquire the lexicon of statistical models and multivariate analysis. They will learn how to communicate effectively the results of their data analyses. A special emphasis will be given to writing concise and clear reports through the project work. Learning skills: This course will empower students with the capability to analyzing data in asymmetric scenarios, and to do so for real-world problems in an independent and critical way. A strong emphasis will be given to the application of the techniques and tools covered in the course to complex problems that are typical of today’s data-driven companies.

Course Contents

The course will cover the following topics: • Introduction to the R statistical software • Multivariate linear regression • Analysis of categorical data: multi-way tables, the generalized linear model. Logistic regression. Poisson regression. • Principal component analysis

Reference Books

Witten J.D., Hastie T. & Tibshirani R. (2014). An Introduction to Statistical Learning with Applications in R. Springer Chatfield, C. & Collins, A. J. (1981) Introduction to Multivariate Analysis. Chapman & Hall/CRC Press Everitt, B. S. & Hothorn, T. (2006) A Handbook of Statistical Analyses Using R. CRC Press.

Teaching Methods

The course consists of lectures which will cover both theory and practice, and take-home assignments. A Q&A session (flipped-classroom) will be held at the beginning of each class.

Assessment Method

There will be a continuous assessment model with a final project worth 30% of the grade. Continuous assessment will be based on a mid-term and final exam to be held during class time. The project is mandatory. Students that will not take the midterm or final during the course are required to take an oral exam after the course.

Thesis assignment criteria

Final grade above 25

Does the syllabus cover sustainability topics?

No

Week 1 Contenuto sessioni on line e on campus

Introduction to the R software

Week 2 Contenuto sessioni on line e on campus

Introduction to statistical learning, review of basics

Week 3 Contenuto sessioni on line e on campus

Linear regression

Week 4 Contenuto sessioni on line e on campus

Linear regression

Week 5 Contenuto sessioni on line e on campus

Linear regression

Week 6 Contenuto sessioni on line e on campus

Linear regression, mid-term

Week 7 Contenuto sessioni on line e on campus

Categorical data analysis

Week 8 Contenuto sessioni on line e on campus

The generalised linear model

Week 9 Contenuto sessioni on line e on campus

Logistic regression

Week 10 Contenuto sessioni on line e on campus

Poisson regression

Week 11 Contenuto sessioni on line e on campus

Principal components analysis

Week 12 Contenuto sessioni on line e on campus

Principal components analysis, final exam