BIG DATA ANALYSIS

BIG DATA ANALYSIS

Cecilia Flori

Instructional goals

Upon successful completion of the course, students will be able to: 1) Formulate quantitative models for business and economic problems 2) Represent datasets using vectors and matrices 3) Analyze multivariable functions describing relationships between variables 4) Apply linear algebra methods to high-dimensional data 5) Use derivatives and gradients to study sensitivity in mathematical models 6) Solve optimization problems arising in decision-making contexts 7) Interpret probabilistic models used in forecasting and risk analysis

Prerequisites

Calculus I and introductory statistics

Course Contents

This course introduces the mathematical foundations underlying modern data-driven decision mak- ing in business environments. The emphasis is on mathematical structures used to represent and analyze large datasets. The course focuses on mathematical methods from: • Linear Algebra • Multivariable Calculus • Optimization • Probability Applications will be drawn from problems such as demand modeling, credit risk analysis, rec- ommendation systems, marketing analytics, and forecasting. While motivated by business applications, the course maintains a strong emphasis on mathe- matical formulation and analysis.

Reference Books

• Strang, G. Introduction to Linear Algebra • Deisenroth, Faisal, Ong. Mathematics for Machine Learning • Boyd and Vandenberghe. Introduction to Applied Linear Algebra

Teaching Methods

Lectures and problem classes

Assessment Method

Midterm, Final exam which might include a project and an oral exam if required by the instructor

Week 1

Revision of eignevlues, eigenvectors, spectral properties of matrices and covariance matrices

Week 2

Principal Component Analysis and High-Dimensional Data. This part of the course introduces methods from linear algebra for analyzing high- dimensional datasets commonly arising in finance, accounting, and marketing. The emphasis is on identifying structure in data and reducing complexity while preserving essential information. The material is developed through the following representative case studies: ∗ Financial Data Reduction and Performance Indicators: Financial datasets often contain many correlated variables (e.g. liquidity, leverage, profitability). These are represented in matrix form. Using covariance matrices and eigenvalue methods, principal directions in the data are identified. This approach, known as Principal Component Analysis (PCA), is used to construct a small number of variables that summarize overall financial performance and risk.

Week 3

Principal Component Analysis and High-Dimensional Data. Continued ∗ Customer and Marketing Data Analysis: Large marketing datasets include multiple indicators of customer behavior. Eigen- vectors of covariance matrices are used to identify dominant patterns and reduce the number of variables required for analysis. This allows the construction of simplified representations of customer segments and behavior.

Week 4

Multivariable functions: The course introduces functions of several variables as a fundamental tool for modeling relationships between multiple economic and financial factors. Emphasis is placed on the geometric and analytical properties of such functions, and their role in quantitative decision-making. The following concepts are developed and motivated through representative problems: • Level Curves and Isoquant Analysis: Isoquants are level curves of production functions representing combinations of inputs that yield the same output, and are used to analyze trade-offs between variables in economic and financial models. Functions are analyzed through their level sets which represent combinations of variables yielding the same output. This framework is used to study problems such as identifying combinations of price and demand that yield constant revenue, or combinations of financial indicators corresponding to the same level of perfor- mance. Level curves provide a geometric interpretation of trade-offs between variables.

Week 5

Partial Derivatives and Marginal Analysis: Partial derivatives are introduced to measure the sensitivity of an outcome with respect to individual variables. These quantities are interpreted as marginal effects in applications such as assessing the impact of changes in price, advertising expenditure, or financial ratios on revenue, profit, or risk. This provides a mathematical foundation for comparative statics in economic and financial models.and global opti- mization methods.

Week 6

Gradient and Directional Optimization: The gradient vector is studied as the direction of steepest increase of a function. This concept is applied to problems of optimizing business and financial objectives, such as maximizing revenue or min- imizing cost, and to understanding how simultaneous changes in multiple variables affect outcomes. The gradient provides a link between local sensitivity analysis and global opti- mization methods.

Week 7

Constrained Optimization and Resource Allocation: Many decision problems involve optimizing an objective function subject to constraints. Problems of this form are studied using the method of Lagrange multipliers This framework is applied to problems such as optimal allocation of a fixed budget across com- peting activities, optimal portfolio allocation under risk constraints, and efficient distribution of financial resources

Week 8

Probability Models and Risk Analysis Introduction The course introduces probability theory as a mathematical framework for modeling uncertainty in economic and financial systems. Emphasis is placed on the formulation of probabilistic models and their interpretation in the context of risk and decision-making.

Week 9

Random Variables and Modeling Uncertainty: Uncertain quantities are represented as random variables. This framework is used to model outcomes such as asset returns, customer defaults, or demand fluctuations. By formalizing uncertainty in terms of random variables, complex financial and business phenomena can be analyzed within a precise mathematical structure

Week 10

Expected Value and Decision Criteria: The expected value is introduced as a measure of the average outcome of a random variable. It is used to eval- uate decisions under uncertainty, such as assessing the expected profitability of investments, pricing strategies, or financial contracts. This provides a mathematical basis for comparing alternative courses of action

Week 11

Variance and Risk Measurement: The variance is studied as a measure of dispersion around the expected value. In financial applications, variance is interpreted as a measure of risk, capturing the uncertainty associated with out- comes such as returns or losses. This allows for the quantitative comparison of alternatives with different risk profiles

Week 12

Probabilistic Models in Risk Analysis: Probability models are used to describe and analyze risk in financial and managerial contexts. Examples include modeling the probability of default, variability in cash flows, or uncertainty in demand. These models provide a framework for quantifying risk and support decision- making processes in areas such as credit analysis, investment evaluation, and financial control.