ADVANCED STATISTICS
Instructional goals
The course provides an overview of some advanced statistical methods for data science. The focus is on understanding advantages and limitations of each approach, interpretation, and main applications in various disciplines, particularly in economics, business, and management. Main focus is on modeling continuous and categorical outcomes. Students in this course will not only gain a deep understanding of asymmetric modeling and dimension reduction for data science, but they will also acquire the practical skills necessary for their successful applications to problems in science and industry.
Intended learning outcomes
Knowledge and understanding:
The course will offer key statistical tools to investigate interrelationships between predictors and a continuous or categorical outcome. It will also discuss dimension reduction techniques, and their use for visualization, scoring, and information retrieval in multivariate data settings. Strengths, weaknesses, use cases, and interpretation of the results of each method will be discussed in depth.
Applying knowledge and understanding:
On successful completion of this course students will be able to:
• Appreciate the different statistical methods for prediction of continuous and categorical outcomes
• Select, implement, and interpret the most appropriate statistical predictive tools in a range of real-world applications.
• Appreciate the perspectives offered by different conditional models
• Summarize multivariate data sets for information retrieval
Making judgments:
Students are expected to be able to choose the appropriate statistical method to pursue their aims with data analysis, taking into considerations data limitations and comparative performance. Students will demonstrate fluency with the software and interpretation of the results. Throughout the entire course, students will be stimulated to consider strengths and weaknesses of the different methods discussed in class.
Communications Skills:
This course will give the students the possibility to acquire the lexicon of statistical models and multivariate analysis. They will learn how to communicate effectively the results of their data analyses. A special emphasis will be given to writing concise and clear reports through the project work.
Learning skills:
This course will empower students with the capability to analyzing data in asymmetric scenarios, and to do so for real-world problems in an independent and critical way. A strong emphasis will be given to the application of the techniques and tools covered in the course to complex problems that are typical of today’s data-driven companies.
Course Contents
The course will cover the following topics:
• Introduction to the R statistical software
• Multivariate linear regression
• Analysis of categorical data: multi-way tables, the generalized linear model. Logistic regression. Poisson regression.
• Principal component analysis
Reference Books
Witten J.D., Hastie T. & Tibshirani R. (2014). An Introduction to Statistical
Learning with Applications in R. Springer
Chatfield, C. & Collins, A. J. (1981) Introduction to Multivariate Analysis.
Chapman & Hall/CRC Press
Everitt, B. S. & Hothorn, T. (2006) A Handbook of Statistical Analyses Using R. CRC Press.
Teaching Methods
The course consists of lectures which will cover both theory and practice, and take-home assignments. A Q&A session (flipped-classroom) will be held at the beginning of each class.
Assessment Method
There will be a continuous assessment model with a final project worth 30% of the grade. Continuous assessment will be based on a mid-term and final exam to be held during class time.
The project is mandatory. Students that will not take the midterm or final during the course are required to take an oral exam after the course.
Thesis assignment criteria
Final grade above 25
Does the syllabus cover sustainability topics?
No
Week 1 Contenuto sessioni on line e on campus
Introduction to the R software
Week 2 Contenuto sessioni on line e on campus
Introduction to statistical learning, review of basics
Week 3 Contenuto sessioni on line e on campus
Linear regression
Week 4 Contenuto sessioni on line e on campus
Linear regression
Week 5 Contenuto sessioni on line e on campus
Linear regression
Week 6 Contenuto sessioni on line e on campus
Linear regression, mid-term
Week 7 Contenuto sessioni on line e on campus
Categorical data analysis
Week 8 Contenuto sessioni on line e on campus
The generalised linear model
Week 9 Contenuto sessioni on line e on campus
Logistic regression
Week 10 Contenuto sessioni on line e on campus
Poisson regression
Week 11 Contenuto sessioni on line e on campus
Principal components analysis
Week 12 Contenuto sessioni on line e on campus
Principal components analysis, final exam