Lecture Winter Term 2023/24:
Mathematical Foundations of Machine Learning with Applications in Finance
(Advanced Topic)
Outline
Part I: Foundations
17.10.2023
- What is (machine) learning?
- "using experience to gain expertise"
- Generalization, inductive reasoning, inductive interference
- Prior knowledge, prior assumptions, inductive bias
- Classification of learning paradigms
- Supervised learning: Spam detection for e-mails
- Unsupervised learning: Anomaly detection, clustering of data sets
- Reinforcement Learning: training data contains more data than test data
- Active Learner: poses questions, performs experiments
- Passive Learner: only observes provided information
- Helpfulness of the teacher, adversarial "teacher"
- Online/Batch Learning Protocol
- Statistical learning Framework
- Domain set, instance space
- Domain points, instances (usually a vector of features)
- Label set, set of all possible labels
- Training data: set of labeled domain points; input for the learner
- Learner's Output:
- prediction rule, predictor, hypothesis, classifier
- task: predict the label of new domain points
- learning algorithm returns a hypothesis upon receiving a certain training sequence
- Data-generation model
- Instances are generated by some probability distribution over the instance space (e.g. representing the environment)
- Assumption: There is some "correct" labeling function (target function) that is unknown to the learner
- Task for the learner is to figure out this labeling function
- The learner is also blind to the underlying probability distribution
- Measures of success
- Error of a classifier
- Generalization error, risk, true error of hypothesis
- Letter "L" stands for "loss of the learner"
- Empirical Risk Minimization (ERM)
- Training Error: error the classifier incurs over the training sample (since true error not available to learner)
- Empirical error, empirical risk
- Overfitting
- ERM with Inductive Bias
- Inductive Bias: restricting the learner to a particular set of predictors, ideally based on prior knowledge
- Finite Hypothesis Classes
- Realizability Assumption
- True risk (of hypothesis) vs. empirical risk
- Confidence parameter (of prediction): probability of getting a representaive sample
- Accuracy parameter for quality of prediction: if output of algorithm is considered approximately correct predictor
18.10.2023
- Probably Approximately Correct (PAC) Learning
- PAC Learnability
- Sample Complexity
- Agnostic PAC Learning
- Uniform Convergence ss sufficient for Learnability
- Finite Classes are Agnostic PAC Learnable
24.10.2023
- Bias-Complexity Tradeoff
- No-Free-Lunch Theorem
- No-Free-Lunch and Prior Knowledge
- Error Decomposition
- Infinite-Size Classes can be Learnable
- VC-Dimension
- Threshold Functions
- Intervals
- Axis Aligned Rectangles
- Finite Classes
- VC-Dimension and the Number of Parameters
- Fundamental Theorem of PAC Learning
- Sauer's Lemma and the Growth Function
- Uniform Convergence for Classes of Small Effective Size
25.10.2023
- Characterizing Nonuniform Learnability
- Structural Risk Minimization
- Minimum Description Length and Occam's Razor
- Other Notions of Learnability - Consistency
- Discussing the Different Notions of Learnability
- The No-Free-Lunch Theorem Revisited
- Computational Complexity of Learning
- Implementing the ERM Rule
- Finite Classes
- Axis Aligned Rectangles
- Boolean Conjunctions
- Learning 3-Term DNF
- Efficiently Learnable, but not by a proper ERM
- Hardness of Learning
31.10.2023
- Linear Predictors
- Halfspaces
- Linear Programming for the Class of Halfspaces
- Perceptron for Halfspaces
- VC Dimension of Halfspaces
- Linear Regression
- Least Squares
- Linear Regression for Polynomial Regression Tasks
Logistic Regression
- Boosting
- Weak Learnability
- Efficient Implementation of ERM for Decision Stumps
- AdaBoost
- Linear Combinations of Base Hypotheses
- AdaBoost for Face Recognition
01.11.2023 (Holiday)
07.11.2023
- Model Selection using Structural Risk Minimization (SRM)
- Validation for Model Selection
- Model-Selection Curve
- k-Fold Cross Validation
- Train-Validation-Test Split
- Understanding the cause of a bad Performance
- Convex Learning Problems
- Convexity, Lipschitzness, and Smoothness
- Convex Learning Problems
- Learnability of Convex Learning Problems
- Convex-Lipschitz/Smooth-Bounded Learning Problems
- Surrogate Loss Functions
08.11.2023
- Regularized Loss Minimization: Ridge Regression
- Stability: Stable Rules Do Not Overfit
- Tikhonov Regularization as a Stabilizer:
Lipschitz Loss / Smooth and Nonnegative Loss
- Control of the Fitting-Stability Tradeoff
- Analysis of Gradient Descent (GD) for Convex-Lipschitz Functions
- Calculating Subgradients of Lipschitz Functions
- Analysis of Stochastic Gradient Descent (SGD) for Convex-Lipschitz-Bounded Functions
- Adding a Projection Step / Variable Step Size
- SGD for Risk Minimization
- Analyzing SGD for Convex-Smooth Learning Problems
- SGD for Regularized Loss Minimization
14.11.2023
- Margin and Hard-Support Vector Machines (SVM)
- The Sample Complexity of Hard-SVM
- oft-SVM and Norm Regularization
- The Sample Complexity of Soft-SVM
- Margin and Norm-Based Bounds versus Dimension
- The Ramp Loss
- Optimality Conditions and 'Support Vector', Duality
- Implementing Soft-SVM Using SGD
- Embeddings into Feature Spaces
- The Kernel Trick
- Kernels as a Way to Express Prior Knowledge
- Characterizing Kernel Functions
- Implementing Soft-SVM with Kernels
15.11.2023
- Multiclass, Ranking, and Complex Prediction Problems
- One-versus-All and All-Pairs
- Linear Multiclass Predictors
- Cost-Sensitive Classification
- Generalized Hinge Loss
- Multiclass SVM and SGD
- Structured Output Prediction, Ranking
- Linear Predictors for Ranking
- Bipartite Ranking and Multivariate Performance Measures
- Decision Tree Algorithms and their Complexity
- Implementations of the Gain Measure
- Pruning
- Threshold-Based Splitting Rules for Real-Valued Features
- Random Forests
21.11.2023
- k Nearest Neighbors
- A Generalization Bound for the 1-NN Rule
- 'Curse of Dimensionality'
- Feedforward Neural Networks
- Learning Neural Networks
- Expressive Power of Neural Networks
- Sample Complexity of Neural Networks
- Runtime of Learning Neural Networks
- SGD and Backpropagation
22.11.2023
- Online Classification in the Realizable Case
- Online Classification in the Unrealizable Case, Weighted-Majority
- Online Convex Optimization
- Online Perceptron Algorithm
- Clustering: Linkage-Based Clustering Algorithms
- k-Means and Other Cost Minimization Clusterings, k-Means Algorithm
- Spectral Clustering: Graph Cut, Graph Laplacian and Relaxed Graph Cuts
- Information Bottleneck
- A High Level View of Clustering
28.11.2023
29.11.2023
05.12.2023
06.12.2023
12.12.2023
13.12.2023
19.12.2023
20.12.2023
09.01.2024
10.01.2024
16.01.2024
17.01.2024
23.01.2024
24.01.2024
30.01.2024
31.01.2024