DATA ANALYSIS FOR SOCIAL SCIENCES

DATA ANALYSIS FOR SOCIAL SCIENCES

Luca Secondi

Instructional goals

To develop advanced skills to correctly understand and evaluate the results of quantitative statistical analysis has become an essential ability for social scientists. This course is aimed to provide the foundations of the main methods of empirical analysis for the study and research on issues in the field of international relations. In addition to the theoretical lectures, practices dealing with real world examples and the use of statistical packages (R, R-studio) are also provided in order to allow students to improve abilities in collecting, analysing, interpreting and presenting data and empirical findings. Through the "Fully-Inquiry Based" learning, the course aims at providing the students the main tools to: i) perform data analysis using descriptive and inferential statistics; ii) compare and contrast different approaches to the empirical analysis and select the most appropriate methodology in the light of the available statistical information and the objective of the study; iii) make informed decisions on selecting the appropriate techniques for describing and presenting the data.

Intended learning outcomes

Knowledge and understanding: the participants are expected to acquire a solid knowledge of the main statistical methods for data analysis and the ability to carry out empirical investigations on the main topics (economic, social, political, demographic) of interest for the social scientists. With reference to statistical methodologies, the students are expected to improve their understanding of the methodological issues and develop the capacity to apply the techniques for: a) the descriptive analysis of data; b) the study of relations between variables from both a descriptive and an inferential perspective; c) a multivariate analysis of data. Students will also acquire skills on the use of several database structures and their management and processing using statistical software. The acquired knowledge will be evaluated continuously according to the University's “Fully Enquiry-Based” rules. Applying knowledge and understanding: Upon the end of the course, the students are expected to strengthen their methodological and analytical skills so that they are allowed to independently interpret analysis and empirical research on the most relevant areas of interest for an international political scientist (for example, demographic and political issues). The participants will be able to: i) evaluate the robustness of the main analytical findings and the reliability of the statistical methods used in the empirical investigation; detect possible inconsistencies in the empirical applications and consider the use of alternative approaches; ii) design case studies relevant for public policy, by outlining the topic of interest, selecting the databases, identifying the methodologies for the empirical analysis, communicating the main results achieved in the form of presentations or reports. Making judgements: the course is aimed at promoting a critical approach on the use of several methods of data analysis for the study of the international subjects of interest. The participants are expected to: i) develop critical skills on the use of the various methods depending on the objectives of the analysis; ii) be able to evaluate the specific contribution of each methodology of data analysis; iii) develop the ability to consistently include the contribution provided by the empirical studies within a broader approach that includes the interdisciplinary background of the students. These objectives are also pursued in the form of active learning activities carried out in small groups, that are useful to stimulate the critical thinking of the students, also in the form of peer evaluation. Communication skills: the students will learn to communicate univocally and clearly the approach adopted for the empirical study, with particular reference to the structure of the databases, the statistical methods used, the results achieved. Effective communication skills of the empirical results and the capacity of an appropriate technical language will be achieved through written tests, presentation and discussion of empirical research, scientific articles and reports issued by international institutions. Learning skills: the instructional methods adopted in this course include case studies, seminars other than the use of learning verification methods through peer evaluations. All of these activities will contribute to improve the capacity of independent judgment and the development of self-learning skills by the students. These abilities will be achieved through the analysis of statistical methods applied to economic, political and social sciences. An important objective of this course is to ensure that students will use quantitative methods in subsequent professional or academic activities (laboratories, stages, traineeships).

Course Contents

Introduction to Statistical Methodology. International and national sources of data for the analysis of economic, social, political and demographic phenomena. Descriptive statistics: describing real data with tables and graphs; measures of positions, variability and shapes. Analysis of concentration. Interpretation and comparison of data referring to socio-economic phenomena: simple and complex (synthetic) index numbers. Probability distributions. Statistical inference: point estimation, confidence interval and hypothesis testing. Association between categorical variables. Linear regression and correlation. Multiple regression and correlation. Regressions with categorical and quantitative predictors. Introduction to logistic regression. Elements of multivariate statistical analysis: principal component analysis and hierarchical and non-hierarchical cluster analysis. Data management and processing using R and R-Studio software. Case studies and applied exercises based on real data, measures and indicators used for the analysis of topics related to the course (as an example, data and analysis related to the Human Development Index, Sustainable Development Goals, the World Bank Development Indicators, European Regional Competitiveness Index and based on Eurostat, OECD, IMF and UNSD datasets).

Reference Books

Agresti A (2018) Statistical methods for the Social Sciences (5th Edition), Pearson (the detailed program of the course reports book chapters and paragraphs to study). Teacher/class notes (in the detailed program an asterisk (*) indicates the topics for which the learning materials will be provided by the teacher)

Teaching Methods

Lectures, exercises, Lab with R and R-studio, applied exercises, interactive learning through published data, measures and report visualizations, case studies in social sciences based on real data, also using statistical and econometric packages and advanced spreadsheet.

Assessment Method

The overall assessment for attending students is based on the evaluation of three assignments: 1) First assignment (first assessment task): it consists of a set of multiple choice and/or open questions to be solved individually and on campus. The weight of this test (score out of 30) is equal to 25% of the overall mark. This assignment will focus on topics up to the association between qualitative variables (Week 4). 2) Second assignment (second assessment task): one problems set /project for data analysis and data processing. Students must solve problem sets/project work using R and R-studio. The papers must be delivered by the date communicated well in advance by the course teacher. The weight for this assessment is 50%. 3) Final exam: written test on the topics covered starting from Week 5 (correlation) and up to the end of the course (Week 12). The written test consists of both theoretical-and empirical questions, also including specific questions on the use, application and interpretation of the statistical output obtained using the R software. The weight of the final test is equal to 25% of the overall evaluation. During the final exam, students are not allowed to consult books or class notes. All tests are mandatory. For attending students, the final grade is obtained as the weighted arithmetic average of the grades achieved in the three assessments (two assignments and final exam). Non-attending students are evaluated through a single final exam, which accounts for 100% of the overall final grade for the courses of interest. The final exam may be different from the one provided for attending students and/or be based on a larger ad hoc program. During the final examination, each candidate will be asked to show a document with a picture (e.g., the university record book). Phones, electronic organizers etc. should be switched off. It is appropriate to use a calculator. WRITTEN EXAMINATION: this type of examination ("scritto verbalizzante") consists in a written exam without a subsequent oral examination. The student must book for the written test. At the end of the final examination, the teacher corrects the homework and publishes the results on the dedicated VOL web page (within one week from the exam date). The students enrolled in the final exam will receive a communication with the grade earned on the written examination (the grade earned in the written examination will also be displayed on the web self service). Since the publication of the results, each student has 3 days to reject the assigned grade. Once the 3-day period is elapsed, the rule of "tacit consent" ("silenzio assenso") applies, and the assigned grade is verbalized by the teacher. The teacher must close down the verbal using the digital signature. Once the verbal is closed down, the grade earned is released to the student through an e-mail communication. The text of the final written examination and the corresponding solutions are made available on the course website before the publication of the grades. Within three days after publication of the results and before the grade becomes official for students, the teacher must schedule a date and time to meet students who wish to view their written exam answers. Each candidate, regardless of the final outcome of the examination, can therefore access the solution of the written examination and choose if to accept or not the assigned grade.

Thesis assignment criteria

The final essay is a work in which statistical methods are applied to political economic or society-related phenomena. The topic is agreed with the teacher.

Week 1 Contenuto sessioni on line e on campus

Introduction: 1.1 Introduction to Statistical Methodology; 1.2 Descriptive Statistics and Inferential Statistics; 1.3. The Role of Computers and Software in Statistics (1.4 Chapter Summary) Statistical sources of data useful for understanding economic, social, political and demographic dynamics in Europe and worldwide. Official statistical institutes and bodies at national and international level. The quality dimension of statistical information (*). Laboratory: introduction to the statistical software R and R-studio: basics, objects, database management. Exercises, applied exercises, case studies concerning research questions in social sciences based on real data and reports. Learning by interactive visualization and output of statistical software.

Week 2 Contenuto sessioni on line e on campus

Descriptive Statistics (3): 3.1 Describing Data with Tables and Graphs; 3.2 Describing the Center of the Data; 3.3 Describing Variability of the Data;3.4 Measures of Position; 3.5 Bivariate Descriptive Statistics; 3.6 Sample Statistics and Population Parameters; (3.7 Chapter Summary). Applied/computation data analysis and visualization Income concentration and poverty measures. Variability and Concentration: definition, notions, Gini measures, applications with real socio-economic data(*). Applied/computation data analysis and visualization Exercises, applied exercises, case studies concerning research questions in social sciences based on real data and reports. Learning by interactive visualization and output of statistical software. Introduction to the statistical software R and R-studio: basics, objects, database management. Lab with R and R-studio, Exercises, applied exercises, case studies concerning research questions in social sciences based on real data and reports. Learning by interactive visualization and outputs of statistical software. Some sources of data for exercises and case studies: http://hdr.undp.org/en/content/human-development-index-hdi http://www.systemicpeace.org/index.html https://www.istat.it/it/benessere-e-sostenibilit%C3%A0/obiettivi-di-sviluppo-sostenibile/gli-indicatori-istat https://demo.istat.it/

Week 3 Contenuto sessioni on line e on campus

Data distribution and random variables (4). Statistical inference and significance tests (6), significance tests and the five parts of a significance test (6.1). Applied/computation data analysis and visualization Exercises, applied exercises, case studies concerning research questions in social sciences based on real data and reports. Learning by interactive visualization and outputs of statistical software. Some sources of data for exercises and case studies: https://www.oecd.org/sdd/oecdmaineconomicindicatorsmei.htm https://www.imf.org/en/Data https://databank.worldbank.org/databases https://unstats.un.org/home/ https://ec.europa.eu/eurostat/data/database

Week 4 Contenuto sessioni on line e on campus

Analyzing Association between categorical variables (8): 8.1 Contingency Tables; 8.2 Chi-Squared Test of Independence; (8.6 Chapter Summary). Association for quantitative variables: correlation (chapter 9). Applied/computation data analysis and visualization Exercises, applied exercises, case studies concerning research questions in social sciences based on real data and reports. Learning by data analysis, interactive visualization and outputs of statistical software. Some sources of data for exercises and case studies: https://www.europeansocialsurvey.org/ https://zacat.gesis.org/webview/index.jsp https://sda.berkeley.edu

Week 5 Contenuto sessioni on line e on campus

Linear Regression and Correlation: (9.1) Linear Relationships; 9.2 Least Squares Prediction Equation; Applied/computation data analysis and visualization Exercises, applied exercises, case studies concerning research questions in social sciences based on real data and reports. Learning by interactive visualization and outputs of statistical software. Some sources of data for exercises and case studies: https://www.europeansocialsurvey.org/ https://zacat.gesis.org/webview/index.jsp https://sda.berkeley.edu/GSS/ First (individual) assignment: multiple-choice questions to be solved on campus (25%).

Week 6 Contenuto sessioni on line e on campus

Linear Regression Model (LRM): 9.3 The Linear Regression Mode l; 9.4 Measuring Linear Association: The Correlation; 9.5 Inferences for the Slope and Correlation; (9.7 Chapter Summary). Exercises, applied exercises, case studies concerning research questions in social sciences based on real data and reports. Learning by interactive visualization and outputs of statistical software. Some sources of data for exercises and case studies: https://www.europeansocialsurvey.org/ https://zacat.gesis.org/webview/index.jsp https://sda.berkeley.edu/GSS/

Week 7 Contenuto sessioni on line e on campus

Introduction to Multivariate Relationships (10): 10.1 Association and Causality; Multiple Regression and Correlation (11): 11.1 The Multiple Regression Model; 11.2 Multiple Correlation and R2; Applied/computation data analysis and visualization Exercises, applied exercises, case studies concerning research questions in social sciences based on real data and reports. Learning by interactive visualization and outputs of statistical software. Some sources of data for exercises and case studies: https://www.europeansocialsurvey.org/ https://zacat.gesis.org/webview/index.jsp https://sda.berkeley.edu/GSS/

Week 8 Contenuto sessioni on line e on campus

Multiple Regression Model: 11.3 Inferences for Multiple Regression Coefficients; (11.8 Chapter Summary). Goodness of fit and nested models Regression with Categorical Predictors (12): Analysis of Variance Methods 12.1; Regression Modeling with Dummy Variables for Categories; Multiple Regression with Quantitative and Categorical Predictors (13): 13.1 Models with Quantitative and Categorical Explanatory Variables; 13.2 Inference for Regression with Quantitative and Categorical Predictors; 13.3. Case studies: Using Multiple Regression in Research. Exercises, applied exercises, case studies concerning research questions in social sciences based on real data and reports. Learning by interactive visualization and outputs of statistical software. Applied/computation data analysis and visualization. Some sources of data for exercises and case studies: https://www.europeansocialsurvey.org/ https://zacat.gesis.org/webview/index.jsp https://sda.berkeley.edu/GSS/ https://www.oecd.org/sdd/oecdmaineconomicindicatorsmei.htm https://www.imf.org/en/Data https://databank.worldbank.org/databases https://unstats.un.org/home/ https://ec.europa.eu/eurostat/data/database

Week 9 Contenuto sessioni on line e on campus

Logistic Regression: Modeling Categorical Responses: 15.1 LogisticRegression;15.2 Multiple Logistic Regression; 15.3 Inference for Logistic Regression Models. Applied/computation data analysis and visualization Exercises, applied exercises, case studies concerning research questions in social sciences based on real data and reports. Learning by interactive visualization and outputs of statistical software. Some sources of data for exercises and case studies: https://www.oecd.org/sdd/oecdmaineconomicindicatorsmei.htm https://www.imf.org/en/Data https://databank.worldbank.org/databases https://unstats.un.org/home/ https://ec.europa.eu/eurostat/data/database

Week 10 Contenuto sessioni on line e on campus

Interpretation and comparison of data referring to socio-economic phenomena. Statistical ratios. Simple and complex (synthetic) index numbers. Some indexes published at national and international level for measuring socio-economic phenomena. Introduction to composite indicators: definition, characteristics, approaches and peculiarities (*). Applied/computation data analysis and visualization Exercises with R and R-studio, applied exercises, case studies concerning research questions in social sciences based on real data and reports. Learning by interactive visualization and outputs of statistical software. Some sources of data for exercises and case studies: https://www.oecd.org/sdd/oecdmaineconomicindicatorsmei.htm https://www.imf.org/en/Data https://databank.worldbank.org/databases https://unstats.un.org/home/ https://ec.europa.eu/eurostat/data/database

Week 11 Contenuto sessioni on line e on campus

Bivariate and multivariate analysis: definition and differences. Introduction to multivariate analysis techniques. Fundamental theoretical notion of Principal Component Analysis (PCA). Introduction to cluster analysis(*). Exercises, applied exercises, case studies concerning research questions in social sciences based on real data and reports. Learning by interactive visualization and outputs of statistical software. Some sources of data for exercises and case studies: https://www.europeansocialsurvey.org/ https://zacat.gesis.org/webview/index.jsp https://sda.berkeley.edu/GSS/ Deadline for the second assignment (problem set/project work): a quantitative/qualitative paper (50%).

Week 12 Contenuto sessioni on line e on campus

Introduction to multivariate analysis. Cluster analysis: partitioning and hierarchical clustering. Optimal number of clusters. Agglomerative and divisive clustering and the dendrogram(*). Applied/computation data analysis and visualization Exercises, applied exercises, case studies concerning research questions in social sciences based on real data and reports. Learning by interactive visualization and outputs of statistical software. Some sources of data for exercises and case studies: https://www.europeansocialsurvey.org/ https://zacat.gesis.org/webview/index.jsp https://sda.berkeley.edu/GSS/