DATA ANALYSIS FOR SOCIAL SCIENCES
Instructional goals
Statistical analysis of univariate as well multivariate data with particular focus on social and economic applications.
Intended learning outcomes
The student will be able to analyze different data using appropriate statistical methodologies. Data analysis will be done with R, that is an essential part of this course.
Course Contents
1) Basic concepts of statistics and R programming
2) Different types of data
3) Exploratory data analysis
4) Big Data and dimensionality reduction
5) Linear regression model
6) Network analysis
7) Natural Language Processing
Reference Books
MAIN:
C. Chapman and E. McDonnell Feit (2015) R
for Marketing Research and Analytics, Springer.
SUGGESTED for R:
- Venables, William N., David M. Smith, and R Development Core Team. "An introduction to R." (2009). https://cran.r-project.org/doc/manuals/r-release/R-intro.pdf
- Wickham, Hadley, and Garrett Grolemund. R for data science: import, tidy, transform, visualize, and model data. " O'Reilly Media, Inc.", 2016. https://r4ds.had.co.nz/
SUGGESTED for the THEORY:
- James, Gareth, et al. An introduction to statistical learning. Vol. 112. New York: springer, 2013. https://www.statlearning.com/
Teaching Methods
Book, slides, lecture notes, R scripts
Assessment Method
TBD
Thesis assignment criteria
TBD
Does the syllabus cover sustainability topics?
Yes, the topic of sustainability will be dealt with by looking at real case applications of statistical techniques
Week 1 Contenuto sessioni on line e on campus
1) Basic concepts of statistics and R programming:
- Introduction to R and RStudio IDE
- Introduction to statistics
Week 2 Contenuto sessioni on line e on campus
2) Different types of data:
- quantitative:
- numeric, continuous, discrete
- qualitative (or categorical)
- textual data
- the different types of data in R
Week 3 Contenuto sessioni on line e on campus
3) Exploratory data analysis:
- Basic summary statistics: min, mean, mode, quantiles, max, variance, standard deviation, coefficient of variation, correlation, covariance, etc.
- Summary statistics in R
- Data visualization: barplot, histograms, maps, pie chart, boxplot, etc.
- Data visualization in R with ggplot2
- Main probability distributions: Guassian, Bernoulli, Binomial, Poisson
- Probability distributions in R
Week 4 Contenuto sessioni on line e on campus
- Data visualization in R with ggplot2
- Main probability distributions: Guassian, Bernoulli, Binomial, Poisson
- Probability distributions in R
Week 5 Contenuto sessioni on line e on campus
4) Big Data and dimensionality reduction:
- Big Data: volume, variety, velocity, veracity
- Principal Component Analysis (PCA)
- PCA in R
- Cluster analysis
- Cluster analysis in R
Week 6 Contenuto sessioni on line e on campus
4) Big Data and dimensionality reduction:
- Big Data: volume, variety, velocity, veracity
- Principal Component Analysis (PCA)
- PCA in R
- Cluster analysis
- Cluster analysis in R
Week 7 Contenuto sessioni on line e on campus
5) Linear regression model
- Recap of statistical inference
- Linear regression model
- OLS method
- Parameters' interpretation and model assessment
- Linear regression in R
Week 8 Contenuto sessioni on line e on campus
5) Linear regression model
- Recap of statistical inference
- Linear regression model
- OLS method
- Parameters' interpretation and model assessment
- Linear regression in R
Week 9 Contenuto sessioni on line e on campus
5) Linear regression model
- Recap of statistical inference
- Linear regression model
- OLS method
- Parameters' interpretation and model assessment
- Linear regression in R
Week 10 Contenuto sessioni on line e on campus
6) Network analysis:
- Introduction to graph theory and summary statistic
- Networks in R
Week 11 Contenuto sessioni on line e on campus
7) Natural Language Processing:
- Introduction to text mining
- Textual data in R
Week 12 Contenuto sessioni on line e on campus
Recap of the topics
Final projects presentation