DATA SCIENCE IN ACTION

DATA SCIENCE IN ACTION

Alessio Martino

Instructional goals

The course is designed so as to be the missing link between model-based analysis (e.g., statistical modeling or simulation) and data-centric techniques (e.g., machine learning and data mining). In particular, the course will make use of many examples of real-life event logs to illustrate and bring down to earth the concepts and algorithms presented in the other courses during the first year of the program. Students will have the opportunity to work in teams in order to solve real-life problems, by combining models and methodologies learned in class with tools and techniques used for maintaining, analyzing and processing data.

Intended learning outcomes

Knowledge and understanding: Through concrete data sets and algorithmic toolkits, the course will provide a good understanding on how to apply data science methodologies in order solve concrete problems, so as to analyze and improve applications in a variety of domains. Applying knowledge and understanding: On successful completion of this course students will be able to: • Design effective solutions to a given data-driven problem using concrete data science methodologies. • Go through the full data science process, starting from data cleaning, building and training models, execution and quality/performance refinement. • Deal with real data science applications "in the wild". Making judgements: Students are expected to be able to analyze different techniques, approaches and models for data science applications. Throughout the entire course, students will be invited to assess critically strengths and weaknesses of different solutions for the same problem. Communications Skills: This course will enhance students’ capabilities to communicate effectively their ideas, findings, proposals, analysis and critical reasoning throughout the completion of their project work. A special emphasis will be given to oral presentations and pitches in project group works, and to writing technical reports and documentation. Learning skills: This course will empower students with the capability to carry out concrete data science projects of industrial interest. A strong emphasis will be given to solving complex business problems that are typical of today’s data-driven companies. Basic knowledge of fundamental algorithms and computer programming skills. Working knowledge of Python is strongly recommended.

Course Contents

Through a hands-on approach, the course will go through the different steps involved in achieving the business goal of a data science project. In particular, it will cover the following topics: • Data understanding (collect, describe, explore and verify data quality) • Data preparation (select, clean and integrate data) • Modeling (build, train and assess models) • Evaluation (evaluate results and review process). • Deployment (plan deployment, plan monitoring and maintenance).

Reference Books

Lecture notes, research papers and course material will be made available on the e-learning platform. The following recommended texts is available for free (on github.com): Python Data Science Handbook by Jake VanderPlas, https://jakevdp.github.io/PythonDataScienceHandbook/

Teaching Methods

The course consists of lectures, testimonials and seminars from industries, complemented by practical lab sessions, group project works, and small contests.

Assessment Method

There will be a project which counts for 100% of the grade, where students are required to demonstrate that: • they are able to design innovative solutions for concrete business problems. • they are able to analyze and assess critically strengths and weaknesses of different data science techniques; • they can apply data science techniques in an independent and critical way; • they can communicate effectively their ideas, findings, proposals, analysis and critical reasoning. The assessment will take into account the students’ capacity for thinking creatively, innovatively, analytically, logically and critically; their capacity to design and evaluate data science solutions, making reasoned judgements about these; their capacity to present effectively findings and conclusions and to write detailed technical reports and documentation about their project work.

Thesis assignment criteria

The final work will be assigned (upon specific request to the instructor) to students who demonstrate a serious and motivated interest to the course topics.

Week 1 Contenuto sessioni on line e on campus

Course Introduction. Practical Lab Session with Python.

Week 2 Contenuto sessioni on line e on campus

Data collection and preparation. Practical Lab Session with Python.

Week 3 Contenuto sessioni on line e on campus

Practical Data Science programming. Practical Lab Session with Python.

Week 4 Contenuto sessioni on line e on campus

Model training. Practical Lab Session with Python.

Week 5 Contenuto sessioni on line e on campus

Model evaluation and deployment. Practical Lab Session with Python.

Week 6 Contenuto sessioni on line e on campus

Industrial Guest Lecture 1 and project discussion.

Week 7 Contenuto sessioni on line e on campus

Industrial Guest Lecture 2 and project discussion.

Week 8 Contenuto sessioni on line e on campus

Industrial Guest Lecture 3 and project discussion.

Week 9 Contenuto sessioni on line e on campus

Model monitoring and maintenance. Project Proof of Concept

Week 10 Contenuto sessioni on line e on campus

Project Mid Review

Week 11 Contenuto sessioni on line e on campus

Project First Release

Week 12 Contenuto sessioni on line e on campus

Project Final Review.