Outline of Lecture - Mathematical Foundations of Machine Learning with Applications in Finance

Bergische Universität Wuppertal
Fachbereich Mathematik und Naturwissenschaften
Applied and Computational Mathematics (ACM)

Team Research Publications Teaching

Univ.-Prof. Dr. Matthias Ehrhardt
PD. Dr. Jörg Kienitz

Lecture Winter Term 2023/24:

Mathematical Foundations of Machine Learning with Applications in Finance

(Advanced Topic)

Outline

Part I: Foundations

17.10.2023

What is (machine) learning?

"using experience to gain expertise"
Generalization, inductive reasoning, inductive interference
Prior knowledge, prior assumptions, inductive bias

Classification of learning paradigms

Supervised learning: Spam detection for e-mails
Unsupervised learning: Anomaly detection, clustering of data sets
Reinforcement Learning: training data contains more data than test data
Active Learner: poses questions, performs experiments
Passive Learner: only observes provided information
Helpfulness of the teacher, adversarial "teacher"
Online/Batch Learning Protocol

Statistical learning Framework
- Domain set, instance space
- Domain points, instances (usually a vector of features)
- Label set, set of all possible labels
- Training data: set of labeled domain points; input for the learner
- Learner's Output:
  - prediction rule, predictor, hypothesis, classifier
  - task: predict the label of new domain points
  - learning algorithm returns a hypothesis upon receiving a certain training sequence
- Data-generation model
  - Instances are generated by some probability distribution over the instance space (e.g. representing the environment)
  - Assumption: There is some "correct" labeling function (target function) that is unknown to the learner
  - Task for the learner is to figure out this labeling function
  - The learner is also blind to the underlying probability distribution
- Measures of success
  - Error of a classifier
  - Generalization error, risk, true error of hypothesis
  - Letter "L" stands for "loss of the learner"
Empirical Risk Minimization (ERM)
- Training Error: error the classifier incurs over the training sample (since true error not available to learner)
- Empirical error, empirical risk
- Overfitting
ERM with Inductive Bias
- Inductive Bias: restricting the learner to a particular set of predictors, ideally based on prior knowledge
- Finite Hypothesis Classes
- Realizability Assumption
- True risk (of hypothesis) vs. empirical risk
- Confidence parameter (of prediction): probability of getting a representaive sample
- Accuracy parameter for quality of prediction: if output of algorithm is considered approximately correct predictor

18.10.2023

Probably Approximately Correct (PAC) Learning
PAC Learnability
Sample Complexity
Agnostic PAC Learning
Uniform Convergence ss sufficient for Learnability
Finite Classes are Agnostic PAC Learnable

24.10.2023

Bias-Complexity Tradeoff
No-Free-Lunch Theorem
- No-Free-Lunch and Prior Knowledge
Error Decomposition
Infinite-Size Classes can be Learnable
VC-Dimension
Threshold Functions
Intervals
Axis Aligned Rectangles
Finite Classes
VC-Dimension and the Number of Parameters
Fundamental Theorem of PAC Learning
Sauer's Lemma and the Growth Function
Uniform Convergence for Classes of Small Effective Size

25.10.2023

Characterizing Nonuniform Learnability
Structural Risk Minimization
Minimum Description Length and Occam's Razor
Other Notions of Learnability - Consistency
Discussing the Different Notions of Learnability
The No-Free-Lunch Theorem Revisited
Computational Complexity of Learning
Implementing the ERM Rule
Finite Classes
Axis Aligned Rectangles
Boolean Conjunctions
Learning 3-Term DNF
Efficiently Learnable, but not by a proper ERM
Hardness of Learning

31.10.2023

Linear Predictors
Halfspaces
- Linear Programming for the Class of Halfspaces
- Perceptron for Halfspaces
- VC Dimension of Halfspaces
Linear Regression
- Least Squares
- Linear Regression for Polynomial Regression Tasks
Boosting
Weak Learnability
- Efficient Implementation of ERM for Decision Stumps
AdaBoost
Linear Combinations of Base Hypotheses
- VC-Dimension of L(B,T)
AdaBoost for Face Recognition

01.11.2023 (Holiday)

07.11.2023

Model Selection using Structural Risk Minimization (SRM)
Validation for Model Selection
Model-Selection Curve
k-Fold Cross Validation
Train-Validation-Test Split
Understanding the cause of a bad Performance
Convex Learning Problems
Convexity, Lipschitzness, and Smoothness
Convex Learning Problems
- Learnability of Convex Learning Problems
- Convex-Lipschitz/Smooth-Bounded Learning Problems
- Surrogate Loss Functions

08.11.2023

Regularized Loss Minimization: Ridge Regression
Stability: Stable Rules Do Not Overfit
Tikhonov Regularization as a Stabilizer: Lipschitz Loss / Smooth and Nonnegative Loss
Control of the Fitting-Stability Tradeoff
Analysis of Gradient Descent (GD) for Convex-Lipschitz Functions
Calculating Subgradients of Lipschitz Functions
Analysis of Stochastic Gradient Descent (SGD) for Convex-Lipschitz-Bounded Functions
Adding a Projection Step / Variable Step Size
SGD for Risk Minimization
Analyzing SGD for Convex-Smooth Learning Problems
SGD for Regularized Loss Minimization

14.11.2023

Margin and Hard-Support Vector Machines (SVM)
The Sample Complexity of Hard-SVM
oft-SVM and Norm Regularization
The Sample Complexity of Soft-SVM
Margin and Norm-Based Bounds versus Dimension
The Ramp Loss
Optimality Conditions and 'Support Vector', Duality
Implementing Soft-SVM Using SGD
Embeddings into Feature Spaces
The Kernel Trick
Kernels as a Way to Express Prior Knowledge
Characterizing Kernel Functions
Implementing Soft-SVM with Kernels

15.11.2023

Multiclass, Ranking, and Complex Prediction Problems
One-versus-All and All-Pairs
Linear Multiclass Predictors
Cost-Sensitive Classification
Generalized Hinge Loss
Multiclass SVM and SGD
Structured Output Prediction, Ranking
Linear Predictors for Ranking
Bipartite Ranking and Multivariate Performance Measures
Decision Tree Algorithms and their Complexity
Implementations of the Gain Measure
Pruning
Threshold-Based Splitting Rules for Real-Valued Features
Random Forests

21.11.2023

k Nearest Neighbors
A Generalization Bound for the 1-NN Rule
'Curse of Dimensionality'
Feedforward Neural Networks
Learning Neural Networks
Expressive Power of Neural Networks
Sample Complexity of Neural Networks
Runtime of Learning Neural Networks
SGD and Backpropagation

22.11.2023

Online Classification in the Realizable Case
Online Classification in the Unrealizable Case, Weighted-Majority
Online Convex Optimization
Online Perceptron Algorithm
Clustering: Linkage-Based Clustering Algorithms
k-Means and Other Cost Minimization Clusterings, k-Means Algorithm
Spectral Clustering: Graph Cut, Graph Laplacian and Relaxed Graph Cuts
Information Bottleneck
A High Level View of Clustering

28.11.2023

29.11.2023

05.12.2023

06.12.2023

12.12.2023

13.12.2023

19.12.2023

20.12.2023

09.01.2024

10.01.2024

16.01.2024

17.01.2024

23.01.2024

24.01.2024

30.01.2024

31.01.2024

University of Wuppertal

Faculty of Mathematics and Natural Sciences

Department of Mathematics

Applied Mathematics & Numerical Analysis Group

Last modified: Disclaimer ehrhardt@math.uni-wuppertal.de

Bergische Universität Wuppertal Fachbereich Mathematik und Naturwissenschaften Applied and Computational Mathematics (ACM)

Part I: Foundations

Bergische Universität Wuppertal
Fachbereich Mathematik und Naturwissenschaften
Applied and Computational Mathematics (ACM)