PYTHON AND R FOR DATA SCIENCE (LAB)
Instructional goals
The course aims to provide technical skills in coding aspects of data analysis. The Python programming language and the R environment are illustrated with a specific focus on those libraries, modules, and functions that allow the students to manage data effectively. This course provides an in-depth understanding of the approaches to preprocess, clean, visualize, and analyze data related to various contexts. Students in this course will mainly acquire practical skills, necessary to analyze real data.
Prerequisites
Basic computer programming skills are required.
Course Contents
The course will cover the following topics: - Python and R Programming Language - Data Loading and Main File Formats - Data Cleaning - Data Manipulation and Transformation - Data Visualization Different frameworks, libraries, modules, and packages will be presented, including: numpy, pandas, matplotlib, seaborn, scikit-learn, ggplot2, and dplyr.
Reference Books
Lecture notes and course material will be available on the e-learning platform. Recommended reading: - “Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython”, 2nd Edition by Wes McKinney.Publisher: O'Reilly Media, Inc. Release Date: October 2017. ISBN: 9781491957660. - “R for Data Science: Import, Tidy, Transform, Visualize, and Model Data” by Garrett Grolemund and Hadley Wickham. Release Date: December 2016. ISBN: 978-1491910399.
Teaching Methods
The course follows a mixed teaching approach, combining lectures and practical exercises. The methodologies adopted aim to promote a deep understanding of data science concepts and to develop practical skills in using Python and R programming tools.
Assessment Method
To pass the course, students must achieve a minimum of 80% of the total available points in both Python and R components during the following assessments: 1) Midterm Written Exam (Weeks 6–7) 2) Final Written Exam (at the end of the course) Both exams will include written questions and programming exercises focused on programming concepts, abstractions, and the use of relevant libraries in Python and R. Note: Students who do not take or do not pass the midterm exam will be required to complete additional questions in the final exam to cover the missed content.
Thesis assignment criteria
A thesis will be assigned (upon specific request to the instructor) to students who demonstrate a serious and motivated interest in the course topics.
Week 1
Python and R Language: basics (part I)
Week 2
Python and R Language: basics (part II)
Week 3
Python and R Language: basics (part III)
Week 4
Python and R language: Data Loading and File Formats
Week 5
Python and R language: Data Cleaning, Preparation, and Manipulation. Python Package: Pandas.
Week 6
Python and R language: Data Visualization (part I) Python Package: Matplotlib
Week 7
Python and R language: Data Visualization (part II) Python Package: Seaborn
Week 8
Python and R language: Objects and classes
Week 9
Python and R language: exercises
Week 10
Python and R language: advanced features (part I)
Week 11
Python and R language: advanced features (part II)
Week 12
Python and R language: final recap