Kindle Notes & Highlights
Started reading
March 17, 2018
the computer program is said to learn from experience E, related to the task T, and measured performance P, if its performance on the task T, measured by metrics P, is improved by the experience E.
Information retrieval - IR, is finding existing information as quickly as possible.
supervised learning
unsupervised learning
unsupervised
We have no information about the desired output value. - The software receives only a set of input parameters (x1, x2, ...,xn). - The task of the program is to reveal hidden structures / dependencies among the data.
problem of overfitting
the model perfectly learns to recognize instances of the training set,
not able to identify instances that are a little differen...
This highlight has been truncated due to consecutive passage length restrictions.
training set is incomplete and does not include future data w...
This highlight has been truncated due to consecutive passage length restrictions.
problem of under-fitting
model is not able to approximate the data from the training set,
Clustering is different from the classification: there is no label (attribute) for the groups. The algorithm seeks to divide the entire data set into homogeneous subgroups or clusters.
Association reveals rules for quantifying the relationship between two or more attributes.
2 measures: degree of fulfillment and reliability of the rule.
backpropagation.
comparing the output generated by the network - with the output that it was supposed to generate.
Hebb’s learning
feed-forward networks,
recurrent networks
single-layer perceptron network,
inputs are forwarded directly to the output - over a set of weights. The sum of the products (weight x input) is calculated in each unit, and if the value is above a threshold (typically 0), the neuron fires the signal and takes the activation value (typically 1); otherwise it takes the deactivation value (typically -1),
backpropagation
Output values are compared with the exact values, in order to calculate the value of a predefined error function. As the error propagates back through the network, weight factors are updated to reduce the error for a certain value. After repeating this process sufficiently, the network usually converges to a state where the error is small.
gradient d...
This highlight has been truncated due to consecutive passage length restrictions.
back propagation is used only for networks whose activation functions are differentiable at each point.

