Serious science and mathematics readings discussion

This topic is about
Understanding Deep Learning
Readings 2025
>
February 2025 - Understanding Deep Learning
date
newest »

Chapter 1 notes
We want to perform an Inference y given Input x using the Model f (i.e. a family of relations), represented as :
y = f(x)
Here x and y are (multi-dimensional) vectors encoding the input/output in a suitable manner and f a function.
A particular relation is determined by the choice of the Parameters p :
y = f(x, p)
Then, the goal of Supervised Learning is to learn the model's parameters p by using a Training Dataset, which is a collection of pairs {x_i, y_i} of input and output vectors.
Call Loss L as the degree of mismatch (needs to be precisely quantified) between the actual relation and the input-output mapping prediction.
Then our objective is to find parameters p_m that minimize the loss function :
p_m = argmin_p (L(p, {x_i, y_i}))
Obviously, the challenge is to find the optimal parameters p_m with the minimum resource consumption possible.
After the training phase is done, we run the model against the Test Data to evaluate its Generalization i.e. performance for instances outside the training distribution.
We want to perform an Inference y given Input x using the Model f (i.e. a family of relations), represented as :
y = f(x)
Here x and y are (multi-dimensional) vectors encoding the input/output in a suitable manner and f a function.
A particular relation is determined by the choice of the Parameters p :
y = f(x, p)
Then, the goal of Supervised Learning is to learn the model's parameters p by using a Training Dataset, which is a collection of pairs {x_i, y_i} of input and output vectors.
Call Loss L as the degree of mismatch (needs to be precisely quantified) between the actual relation and the input-output mapping prediction.
Then our objective is to find parameters p_m that minimize the loss function :
p_m = argmin_p (L(p, {x_i, y_i}))
Obviously, the challenge is to find the optimal parameters p_m with the minimum resource consumption possible.
After the training phase is done, we run the model against the Test Data to evaluate its Generalization i.e. performance for instances outside the training distribution.
Happy reading and let's hope we can have illuminating discussions on the same!