DATA ANALYSIS FOR SOCIAL SCIENCES

Marco Mingione

Instructional goals

Statistical analysis of univariate and multivariate data with particular focus on social and economic applications.

Prerequisites

Basic concepts of Mathematics.

Intended learning outcomes

The student will be able to analyze different data using appropriate statistical methodologies. Data analysis will be done with R, that is an essential part of this course.

Course Contents

1) Basic concepts of statistics and R programming 2) Different types of data 3) Exploratory data analysis 4) Linear regression model 5) Generalized Linear Models 6) Tentative: Network analysis 7) Tentative: Natural Language Processing

Reference Books

MAIN: C. Chapman and E. McDonnell Feit (2015) R for Marketing Research and Analytics, Springer. SUGGESTED for R: - Venables, William N., David M. Smith, and R Development Core Team. "An introduction to R." (2009). https://cran.r-project.org/doc/manuals/r-release/R-intro.pdf - Wickham, Hadley, and Garrett Grolemund. R for data science: import, tidy, transform, visualize, and model data. " O'Reilly Media, Inc.", 2016. https://r4ds.had.co.nz/ SUGGESTED for the THEORY: - James, Gareth, et al. An introduction to statistical learning. Vol. 112. New York: springer, 2013. https://www.statlearning.com/

Teaching Methods

Book, slides, lecture notes, R scripts

Assessment Method

A written exam (at least) and a project work on a real set of data

Thesis assignment criteria

TBD

Week 1 Contenuto sessioni on line e on campus

1) Basic concepts of statistics and R programming: - Introduction to R and RStudio IDE - Introduction to statistics

Week 2 Contenuto sessioni on line e on campus

2) Different types of data: - quantitative: - numeric, continuous, discrete - qualitative (or categorical) - textual data - the different types of data in R

Week 3 Contenuto sessioni on line e on campus

3) Exploratory data analysis: - Basic summary statistics: min, mean, mode, quantiles, max, variance, standard deviation, coefficient of variation, correlation, covariance, etc. - Summary statistics in R - Data visualization: barplot, histograms, maps, pie chart, boxplot, etc. - Data visualization in R with ggplot2 - Main probability distributions: Guassian, Bernoulli, Binomial, Poisson - Probability distributions in R

Week 4 Contenuto sessioni on line e on campus

- Data visualization in R with ggplot2 - Main probability distributions: Guassian, Bernoulli, Binomial, Poisson - Probability distributions in R

Week 5 Contenuto sessioni on line e on campus

5) Linear regression model - Recap of statistical inference - Linear regression model - OLS method - Parameters' interpretation and model assessment - Linear regression in R

Week 6 Contenuto sessioni on line e on campus

5) Linear regression model - Recap of statistical inference - Linear regression model - OLS method - Parameters' interpretation and model assessment - Linear regression in R

Week 7 Contenuto sessioni on line e on campus

5) Linear regression model - Recap of statistical inference - Linear regression model - OLS method - Parameters' interpretation and model assessment - Linear regression in R

Week 8 Contenuto sessioni on line e on campus

5) Generalized Linear Model - Model formulation - Estimation method - Parameters' interpretation and model assessment - GLM in R

Week 9 Contenuto sessioni on line e on campus

5) Generalized Linear Model - Model formulation - Estimation method - Parameters' interpretation and model assessment - GLM in R

Week 10 Contenuto sessioni on line e on campus

6) Network analysis: - Introduction to graph theory and summary statistic - Networks in R

Week 11 Contenuto sessioni on line e on campus

7) Natural Language Processing: - Introduction to text mining - Textual data in R

Week 12 Contenuto sessioni on line e on campus

Recap of the topics Final projects presentation