DATA ANALYSIS FOR SOCIAL SCIENCES

DATA ANALYSIS FOR SOCIAL SCIENCES

Luca Secondi

Instructional goals

To develop advanced skills to correctly understand and evaluate the results of quantitative statistical analysis has become an essential ability for social scientists. This course is aimed to provide the foundations of the main methods of empirical analysis for the study and research on issues in the field of international relations. In addition to the theoretical lectures, practices dealing with real world examples and the use of statistical packages (R, R-studio) are also provided in order to allow students to improve abilities in collecting, analysing, interpreting and presenting data and empirical findings. Upon the end of the course, students will be able to: i) perform data analysis using descriptive and inferential statistics; ii) compare and contrast different approaches to the empirical analysis and select the most appropriate methodology in the light of the available statistical information and the objective of the study; iii) make informed decisions on selecting the appropriate techniques for describing and presenting the data.

Intended learning outcomes

Knowledge and understanding: the participants are expected to acquire a solid knowledge of the main statistical methods for data analysis and the ability to carry out empirical investigations on the main topics (economic, social, political, demographic) of interest for the social scientists. With reference to statistical methodologies, the students are expected to improve their understanding of the methodological issues and develop the capacity to apply the techniques for: a) the descriptive analysis of data; b) the study of relations between variables from both a descriptive and an inferential perspective; c) a multivariate analysis of data (cluster analysis). Students will also acquire skills on the use of several database structures and their management and processing using statistical software. Applying knowledge and understanding: Upon the end of the course, the students are expected to strengthen their methodological and analytical skills so that they are allowed to independently interpret analysis and empirical research on the most relevant areas of interest for an international political scientist. The participants will be able to: i) evaluate the robustness of the main analytical findings and the reliability of the statistical methods used in the empirical investigation; detect possible inconsistencies in the empirical applications and consider the use of alternative approaches; ii) design case studies relevant for public policy, by outlining the topic of interest, selecting the databases, identifying the methodologies for the empirical analysis, communicating the main results achieved in the form of presentations or reports. Making judgements: the course is aimed at promoting a critical approach on the use of several methods of data analysis for the study of the international subjects of interest. The participants are expected to: i) develop critical skills on the use of the various methods depending on the objectives of the analysis; ii) be able to evaluate the specific contribution of each methodology of data analysis; iii) develop the ability to consistently include the contribution provided by the empirical studies within a broader approach that includes the interdisciplinary background of the students. These objectives are also pursued in the form of active learning activities carried out in small groups, that are useful to stimulate the critical thinking of the students, also in the form of peer evaluation. Communication skills: the students will learn to communicate univocally and clearly the approach adopted for the empirical study, with particular reference to the structure of the databases, the statistical methods used, the results achieved. Effective communication skills of the empirical results and the capacity of an appropriate technical language will be achieved through written tests, presentation and discussion of empirical research, scientific articles and reports issued by international institutions. Learning skills: the instructional methods adopted in this course include case studies, seminars other than the use of learning verification methods through peer evaluations. All of these activities will contribute to improve the capacity of independent judgment and the development of self-learning skills by the students. These abilities will be achieved through the analysis of statistical methods applied to economic, political and social sciences. An important objective of this course is to ensure that students will use quantitative methods in subsequent professional or academic activities (laboratories, stages, traineeships).

Course Contents

Introduction to Statistical Methodology. International and national sources of data for the analysis of economic, social, political and demographic phenomena. Sampling and measurement. Descriptive statistics: describing real data with tables and graphs; measures of positions, variability and shapes. Analysis of concentration. Interpretation and comparison of data referring to socio-economic phenomena: simple and complex (synthetic) index numbers. Probability distributions. Statistical inference: point estimation, confidence interval and hypothesis testing. Association between categorical variables. Linear regression and correlation. Multiple regression and correlation. Regressions with categorical and quantitative predictors. Introduction to logistic regression. Elements of multivariate statistical analysis: hierarchical and non-hierarchical cluster analysis. Data management and processing using R and Rstudio software. Case studies and applied exercises based on real data, measures and indicators used for the analysis of topics related to the course (as an example, data and analysis related to the Human Development Index, Sustainable Development Goals, the World Bank Development Indicators, European Regional Competitiveness Index and based on Eurostat, OECD, IMF and UNSD datasets).

Reference Books

Agresti A (2018) Statistical methods for the Social Sciences (5th Edition), Pearson (the detailed program of the course reports book chapters and paragraphs to study). Teacher/class notes (in the detailed program an asterisk (*) indicates the topics for which the learning materials will be provided by the teacher)

Teaching Methods

Lectures, exercises, Lab with R and R-studio, applied exercises, interactive learning through published data, measures and report visualizations, case studies in social sciences based on real data, also using statistical and econometric packages and advanced spreadsheet. The course approach follows a fully enquiry-based model.

Assessment Method

The final examination is in the form of a written exam, consisting of both theoretical and empirical questions also regarding the analysis and findings/outputs obtained by using R-R-studio; during the final exam, students are not allowed to consult books or class notes. Students attending the course are requested to solve 1 problem set/project works on real data using R and R-studio. The students are required to hand in the solutions to the problem set/project work stricltly by the established deadline. For students attending the course, the examination is completed with a written examination that must be held (once only) in one of the examination dates of the summer session ("sessione estiva"), at the student's choice. The written test for attending students will consist of theory and applied questions covering the entire course programme. Both the written test and the problem set/project work are compulsory. For students attending the course, the final mark is obtained as the sum/weighted of the marks obtained in the project work (max 16 points) and the written test (max 16.5 points). Students who are not satisfied with the (partial) written examination taken can always repeat it within the summer session. Student is not satisfied with the final grade, cannot accept the final mark awarded and take the entire examination, which will include theoretical, empirical questions and also questions aimed at understanding the use of R/R-studio for data analysis. If the student who takes the exam does not withdraw within first 20 minutes from the start of test, the student will be not allowed to take the exam on the following exam date within the same session ("salto d'appello"). During the final examination, each candidate will be asked to show a document with a picture (e.g., the university record book). Phones, electronic organizers etc. should be switched off. It is appropriate to use a calculator. WRITTEN EXAMINATION: this type of examination ("scritto verbalizzante") consists in a written exam without a subsequent oral examination. The student must book for the written test. At the end of the final examination, the teacher corrects the homework and publishes the results on the dedicated VOL web page (within one week from the exam date). The students enrolled in the final exam will receive a communication with the grade earned on the written examination (the grade earned in the written examination will also be displayed on the web self service). Since the publication of the results, each student has 3 days to reject the assigned grade. Once the 3-day period is elapsed, the rule of "tacit consent" ("silenzio assenso") applies, and the assigned grade is verbalized by the teacher. The teacher must close down the verbal using the digital signature. Once the verbal is closed down, the grade earned is released to the student through an e-mail communication. The text of the final written examination and the corresponding solutions are made available on the course website before the publication of the grades. Within three days after publication of the results and before the grade becomes official for students, the teacher must schedule a date and time to meet students who wish to view their written exam answers. Each candidate, regardless of the final outcome of the examination, can therefore access the solution of the written examination and choose if to accept or not the assigned grade. In the exceptional case of non-attendance of the course, the student is required to take a single written exam, whose contents will refer to both theoretical questions and empirical problems and exercises, and that will also require the knowledge of the software packages R used during the course.

Thesis assignment criteria

The final essay is a work in which statistical methods are applied to political economic or society-related phenomena. The topic is agreed with the teacher.

Does the syllabus cover sustainability topics?

The course proposes methodologies and practical applications related to the Sustainable Development Goals (SDGs), with specific reference to SDGs 1, 2, 3, 4, 5 and 10 and with a priority focus on poverty reduction and territorial inequalities.

Week 1 Contenuto sessioni on line e on campus

Introduction: 1.1 Introduction to Statistical Methodology; 1.2 Descriptive Statistics and Inferential Statistics; 1.3. The Role of Computers and Software in Statistics (1.4 Chapter Summary) Statistical sources of data useful for understanding economic, social, political and demographic dynamics in Europe and worldwide. Official statistical institutes and bodies at national and international level. The quality dimension of statistical information(*). Sampling and Measurement (2): 2.1.Variables and Their Measurement; 2.2 Randomization; 2.3 Sampling Variability and Potential Bias; 2.4 Other Probability Sampling Methods; (2.5 Chapter Summary). Laboratory: introduction to the statistical software R and R-studio: basics, objects, database management. Exercises, applied exercises, case studies concerning research questions in social sciences based on real data and reports. Learning by interactive visualization and output of statistical software.

Week 2 Contenuto sessioni on line e on campus

Descriptive Statistics (3): 3.1 Describing Data with Tables and Graphs; 3.2 Describing the Center of the Data; 3.3 Describing Variability of the Data;3.4 Measures of Position; 3.5 Bivariate Descriptive Statistics; 3.6 Sample Statistics and Population Parameters; (3.7 Chapter Summary). Applied/computation data analysis and visualization Introduction to the statistical software R and R-studio: basics, objects, database management. Lab with R and R-studio, Exercises, applied exercises, case studies concerning research questions in social sciences based on real data and reports. Learning by interactive visualization and outputs of statistical software. Some sources of data for exercises and case studies: http://hdr.undp.org/en/content/human-development-index-hdi http://www.systemicpeace.org/index.html https://www.istat.it/it/benessere-e-sostenibilit%C3%A0/obiettivi-di-sviluppo-sostenibile/gli-indicatori-istat https://demo.istat.it/

Week 3 Contenuto sessioni on line e on campus

Income concentration and poverty measures. Variability and Concentration: definition, notions, Gini measures, applications with real socio-economic data(*).Applied/computation data analysis and visualization Exercises, applied exercises, case studies concerning research questions in social sciences based on real data and reports. Learning by interactive visualization and outputs of statistical software. Some sources of data for exercises and case studies: http://www.systemicpeace.org/index.html https://qog.pol.gu.se/data https://www.transparency.org/en/cpi/2019

Week 4 Contenuto sessioni on line e on campus

Interpretation and comparison of data referring to socio-economic phenomena. Statistical ratios. Simple and complex (synthetic) index numbers. Some indexes published at national and international level for measuring socio-economic phenomena. Introduction to composite indicators: definition, characteristics, approaches and peculiarities (*).Applied/computation data analysis and visualization Exercises with R and R-studio, applied exercises, case studies concerning research questions in social sciences based on real data and reports. Learning by interactive visualization and outputs of statistical software. Some sources of data for exercises and case studies: https://www.oecd.org/sdd/oecdmaineconomicindicatorsmei.htm https://www.imf.org/en/Data https://databank.worldbank.org/databases https://unstats.un.org/home/ https://ec.europa.eu/eurostat/data/database

Week 5 Contenuto sessioni on line e on campus

Analyzing Association between categorical variables (8): 8.1 Contingency Tables; 8.2 Chi-Squared Test of Independence; (8.6 Chapter Summary). Association for quantitative variables: correlation (chapter 9). Applied/computation data analysis and visualization Exercises, applied exercises, case studies concerning research questions in social sciences based on real data and reports. Learning by data analysis, interactive visualization and outputs of statistical software. Some sources of data for exercises and case studies: https://www.europeansocialsurvey.org/ https://zacat.gesis.org/webview/index.jsp https://sda.berkeley.edu

Week 6 Contenuto sessioni on line e on campus

Data distribution and random variables (Chapter 4). Statistical inference and significance tests (6), significance tests and the five parts of a significance test (6.1). Applied/computation data analysis and visualization Exercises, applied exercises, case studies concerning research questions in social sciences based on real data and reports. Learning by interactive visualization and outputs of statistical software. Some sources of data for exercises and case studies: https://www.oecd.org/sdd/oecdmaineconomicindicatorsmei.htm https://www.imf.org/en/Data https://databank.worldbank.org/databases https://unstats.un.org/home/ https://ec.europa.eu/eurostat/data/database

Week 7 Contenuto sessioni on line e on campus

Linear Regression and Correlation: (9.1) Linear Relationships; 9.2 Least Squares Prediction Equation; 9.3 The Linear Regression Model; 9.4 Measuring Linear Association: The Correlation; 9.5 Inferences for the Slope and Correlation; (9.7 Chapter Summary). Applied/computation data analysis and visualization Exercises, applied exercises, case studies concerning research questions in social sciences based on real data and reports. Learning by interactive visualization and outputs of statistical software. Some sources of data for exercises and case studies: https://www.europeansocialsurvey.org/ https://zacat.gesis.org/webview/index.jsp https://sda.berkeley.edu/GSS/

Week 8 Contenuto sessioni on line e on campus

Introduction to Multivariate Relationships (10): 10.1 Association and Causality; Multiple Regression and Correlation (11): 11.1 The Multiple Regression Model; 11.2 Multiple Correlation and R2; 11.3 Inferences for Multiple Regression Coefficients; (11.8 Chapter Summary). Applied/computation data analysis and visualization Exercises, applied exercises, case studies concerning research questions in social sciences based on real data and reports. Learning by interactive visualization and outputs of statistical software. Some sources of data for exercises and case studies: https://www.europeansocialsurvey.org/ https://zacat.gesis.org/webview/index.jsp https://sda.berkeley.edu/GSS/

Week 9 Contenuto sessioni on line e on campus

Regression with Categorical Predictors (12): Analysis of Variance Methods 12.1; Regression Modeling with Dummy Variables for Categories; Applied/computation data analysis and visualization Exercises, applied exercises, case studies concerning research questions in social sciences based on real data and reports. Learning by interactive visualization and outputs of statistical software. Some sources of data for exercises and case studies: https://www.oecd.org/sdd/oecdmaineconomicindicatorsmei.htm (macro) https://www.imf.org/en/Data https://databank.worldbank.org/databases https://unstats.un.org/home/ https://ec.europa.eu/eurostat/data/database

Week 10 Contenuto sessioni on line e on campus

Multiple Regression with Quantitative and Categorical Predictors (13): 13.1 Models with Quantitative and Categorical Explanatory Variables; 13.2 Inference for Regression with Quantitative and Categorical Predictors; 13.3. Case studies: Using Multiple Regression in Research. Applied/computation data analysis and visualization. Exercises, applied exercises, case studies concerning research questions in social sciences based on real data and reports. Learning by interactive visualization and outputs of statistical software. Some sources of data for exercises and case studies: https://www.oecd.org/sdd/oecdmaineconomicindicatorsmei.htm https://www.imf.org/en/Data https://databank.worldbank.org/databases https://unstats.un.org/home/ https://ec.europa.eu/eurostat/data/database

Week 11 Contenuto sessioni on line e on campus

Logistic Regression: Modeling Categorical Responses: 15.1 LogisticRegression;15.2 Multiple Logistic Regression; 15.3 Inference for Logistic Regression Models . Applied/computation data analysis and visualization Exercises, applied exercises, case studies concerning research questions in social sciences based on real data and reports. Learning by interactive visualization and outputs of statistical software. Some sources of data for exercises and case studies: https://www.oecd.org/sdd/oecdmaineconomicindicatorsmei.htm https://www.imf.org/en/Data https://databank.worldbank.org/databases https://unstats.un.org/home/ https://ec.europa.eu/eurostat/data/database

Week 12 Contenuto sessioni on line e on campus

Introduction to multivariate analysis. Cluster analysis: partitioning and hierarchical clustering. Optimal number of clusters. Agglomerative and divisive clustering and the dendrogram(*). Applied/computation data analysis and visualization Exercises, applied exercises, case studies concerning research questions in social sciences based on real data and reports. Learning by interactive visualization and outputs of statistical software. Some sources of data for exercises and case studies: https://www.europeansocialsurvey.org/ https://zacat.gesis.org/webview/index.jsp https://sda.berkeley.edu/GSS/