Luigi Marotta’s Kindle Notes & Highlights for The Math of Neural Networks

The Khan Academy offers top-notch free courses in algebra, calculus and statistics.

Hyperparameters are essentially fine tuning knobs that can be tweaked to help a network successfully train. In fact, they are determined before a network trains, and can only be adjusted manually by the individual or team who created the network. The network itself does not adjust them.

6%

An input node contains the input of the network and this input is always numerical.

Input nodes

7%

Each input node represents a single dimension and is often called a feature, and all features are stored within a vector.

Features

7%

A hidden layer is a layer of nodes between the input and output layers.

Hidden layers

7%

multiple hidden layers are what the term deep learning refers to.

8%

A hidden node is a node within a hidden layer.

Hidden nodes

8%

Studies have demonstrated that layers which contain the same amount of nodes generally performed the same or better than a decreasing or increasing pyramid shaped network.

8%

An output node is a node within an output layer.

Output nodes

8%

A weight is a variable that sits on an edge between nodes.

Weight values

9%

Weights are important because they are the primary tuning knob that is used to train a network.

9%

A bias node is an extra node added to each hidden and output layer, and it connects to every node within each respective layer. A bias is never connected to a previous layer, but is simply added to the input of a layer. In addition, it typically has a constant value of 1 or -1 and has a weight on its connecting edge.

Bias values

9%

On a practical level, this enables the activation function to be shifted to the left or right. Alongside the adjusting of weights, this “shifting” can be very important and critical for successful learning.

9%

The learning rate is a value that speeds up or slows down how quickly an algorithm learns.

Learning rates

10%

A learning rate can be static and remain consistent throughout the learning process, or it can be programmed to scale down as the network's error rate falls.

10%

Momentum is a value that is used to help push a network out of a local minimum, which is a false "lowest-error" that networks often become trapped in.

Momenta

12%

Within a neural network, an activation function receives the output of the summation operator and transforms it into the final output of a node.

14%

In recent years (2015 -) the ReLU function has gained popularity because of its phenomenal performance within deep neural networks, especially for image recognition with convolutional neural networks (CNNs).

15%

Activation functions such as the tanh and logistic essentially "break" the linearity of a network and enable it to solve more complex problems. This act of breaking is an essential component that enables networks to map the input of a network to the output of a network and successfully train.

15%

You can view this as fine tuning that makes the task of "learning" much smoother and easier.

20%

Depending on where the network is in the training process, this stage could be the final stage. It will, however, always be the final stage for any network that successfully trains.

28%

The larger the network, the greater the impact of a change.

42%

if a network has multiple output nodes, this partial derivative (w5 in this case) would be multiplied by each Deltaz

61%

one-hot encoding.

61%

One-hot encoding is method that transforms categorical features (such as our man and chicken) into a new format of 1's and 0 (zeros). This format is popular because it has proven to work very well with classification and regression algorithms.

80%

Definition: An activation function defines the output of a node given a set of inputs. There are many types of activation functions. The two most common types of activation functions used in neural networks are the Logistic (also called Sigmoid) and Hyperbolic Tangent.

81%

Definition: Neural Networks (NN) are one type of learning algorithm that is used within machine learning. They are modeled on the human brain and nervous system and the goal of NN's is to process information in a similar way to the human brain.

81%

Definition: Inside of an artificial neural network, nodes represent the neurons within a human brain. A neural network consists of multiple layers (input, hidden, and output), and each of these layers consists of either one or multiple nodes. These nodes are all connected and work together to solve a problem.

81%

Definition: A neural network consists of multiple layers (input, hidden, and output), and each of these layers consists of either one or multiple nodes. These nodes are all connected via edges, which mimic the synapse connections found within the human brain.

81%

Definition: Neural networks are supervised learning algorithms, which means that the network is provided with a training set. This training set provides targets that the network aims to achieve. Technically speaking, the target is the desired output for the given input.

82%

A neural network is successfully trained once it has minimized (to an acceptable level) the difference between its real or actual output and its target output. This difference that the network strives to minimize is technically called an error or total error.

82%

Definition: Total input refers to the sum of all inputs into a hidden or output node. It is calculated by adding together the multiplication of each input by its respective weight. This calculation is usually performed using the summation operator.

85%

Dropout Dropout is a form of regularization that helps a network generalize its fittings and increase accuracy.

86%

An epoch refers to one forward pass and one backward pass of all training examples in a neural network.

See a Problem?

Preview — The Math of Neural Networks by Michael Taylor