LABS: CODING IN ACTION (MODULE II)
Instructional goals
This course builds on the work done during the first module of the Coding in Action Lab.
In the first phase, the course introduces the concepts of computational cost and efficiency of a program, illustrates some best practices for writing readable, organized, and efficient code, and presents the main tools for data processing, analysis, and visualization available in Python, one of the reference programming languages in the "data science" world.
In the second phase, the course focuses on the main data processing and analysis techniques: data acquisition, cleaning and preprocessing; statistical analysis and data visualization; clustering and classification.
Finally, students will tackle 2 concrete problems using real data: customer segmentation, and defining a recommendation system.
Intended learning outcomes
Knowledge and understanding:
By the end of this course, the students will understand that a good program solves the given task making use of the least possible amount of resources. They will learn specific Python syntax and best practices needed to make programming faster and increase performance. Further, the students will learn how to use, at an introductory level, the most known Python packages for data analysis and visualization: NumPy, Pandas, Scikit-learn and Matplotlib.
By the end of the course, the students will be familiar with the main concepts of data analysis and they will understand the importance of using suitable algorithms to extract trends and patterns from data by combining techniques of data mining, predictive modeling, and machine learning.
The course will teach students to use a data-driven approach to problem-solving and decision-making, fostering their critical thinking and their ability to work alone or in group.
Applying knowledge and understanding:
To test their understanding of the concepts seen in class, the students will be assigned projects that require to deal with real data. They will be asked to:
• Rapidly manipulate and process large data sets through dataframes and arrays
• Plot data and functions
• Operate basic descriptive analysis and modeling (statistics, histograms, interpolation, clustering, fitting)
The course will prepare the students for more advanced data analysis courses and make them ready to complete projects in other courses that require a computational approach to data.
Making judgements:
Upon completing the study program, students will be able to:
• Compare simple algorithms that solve the same task in terms of their computational cost
• Address simple data analysis project
Communications Skills:
Being introduced to the concepts of computational and memory efficiency and to their formalization, the students will understand that the “cost” of solving a problem can be precisely quantified and expressed.
Through examples, case studies and projects, the students will learn how to communicate the results of a data analysis task and how to justify the choice of specific algorithms, methods and techniques.
Learning skills:
The students will be introduced to a set of advanced/professional libraries for data analysis. At the end of the course, they will be able to autonomously browse the Python standard library as well as the web to find the right libraries and tools to perform a given task.
Course Contents
The course will cover the following aspects of computer programming:
• Principles of computational complexity, efficiency of an algorithm
• Libraries for scientific programming: Numpy, Pandas, Scipy, Scikit-learn, Matplotlib
• Data ingestion, cleaning and preprocessing.
Statistical distribution visualization.
Correlation analysis, regression models and clustering.
Customer segmentation and the RFM model.
Recommentation systems.
Reference Books
Allen B. Downey, “Think Python: How to Think Like a Computer Scientist (2nd Edition)”, O’Reilly, ISBN-13: 978-1491939369
Hastie, T., Tibshirani, R., Friedman, J. H., & Friedman, J. H. (2009). The elements of statistical learning: data mining, inference, and prediction (Vol. 2, pp. 1-758). New York: springer.
VanderPlas, J. (2016). Python data science handbook: Essential tools for working with data. " O'Reilly Media, Inc.".
Shmueli, G., Bruce, P. C., Gedeck, P., & Patel, N. R. (2019). Data mining for business analytics: concepts, techniques and applications in Python. John Wiley & Sons.
Teaching Methods
During the course frontal teaching will be used to introduce students to the new topics. For each topic, some examples and case studies will then be considered, which students will have to deal with independently, in groups, in line with the "learning by doing" paradigm. A program based on problem solving (and discussion of them in the exam) will force students to motivate their choices, explaining why they decided to solve the project in a specific way. These moments help consolidate personal learning and share knowledge with the community. Collective intelligence, which allows good practices to emerge and groups to advance, plays an important role in this teaching method.
Students are asked to think for themselves in front of a computer. In this circumstance, making mistakes will not be penalized, but considered part of a learning journey. Pedagogical staff will be on hand to help students find their own solutions.
Assessment Method
Assessment for this course will be based on group solving of programming projects to be carried out during the weeks of the course and discussed with the instructor in an oral exam.
Thesis assignment criteria
No thesis will be assigned
Week 1
Since this is a course organized in the GAP, all the course content, outlined above, will be concentrated in 2 weeks of classes.
Week 2
Since this is a course organized in the GAP, all the course content, outlined above, will be concentrated in 2 weeks of classes.
Week 3
Since this is a course organized in the GAP, all the course content, outlined above, will be concentrated in 2 weeks of classes.
Week 4
Since this is a course organized in the GAP, all the course content, outlined above, will be concentrated in 2 weeks of classes.
Week 5
Since this is a course organized in the GAP, all the course content, outlined above, will be concentrated in 2 weeks of classes.
Week 6
Since this is a course organized in the GAP, all the course content, outlined above, will be concentrated in 2 weeks of classes.
Week 7
Since this is a course organized in the GAP, all the course content, outlined above, will be concentrated in 2 weeks of classes.
Week 8
Since this is a course organized in the GAP, all the course content, outlined above, will be concentrated in 2 weeks of classes.
Week 9
Since this is a course organized in the GAP, all the course content, outlined above, will be concentrated in 2 weeks of classes.
Week 10
Since this is a course organized in the GAP, all the course content, outlined above, will be concentrated in 2 weeks of classes.
Week 11
Since this is a course organized in the GAP, all the course content, outlined above, will be concentrated in 2 weeks of classes.
Week 12
Since this is a course organized in the GAP, all the course content, outlined above, will be concentrated in 2 weeks of classes.