More on this book
Kindle Notes & Highlights
Read between
January 7 - February 11, 2018
The binary cross-entropy is just a technical term for the cost function in the logistic regression, and the categorical cross-entropy is its generalization for multiclass predictions via softmax,
Although we referred to this activation function as a sigmoid function—as it is commonly called in literature—the more precise definition would be a logistic function or negative log-likelihood function.
The softmax function is in fact a soft form of the argmax function; instead of giving a single class index, it provides the probability of each class.
Another sigmoid function that is often used in the hidden layers of artificial neural networks is the hyperbolic tangent (commonly known as tanh),
TensorFlow has strong growth drivers. Its development is funded and supported by Google, and so a large team of software engineers work on improvements continuously. TensorFlow also has strong support from open source developers, who avidly contribute and provide user feedback. This has made the TensorFlow library more useful to both academic researchers and developers in their industry. A further consequence of these factors is that TensorFlow has extensive documentation and tutorials to help new users. Last but not least among these key features, TensorFlow supports mobile deployment, which
...more
Tensors are a generalizable mathematical notation for multidimensional arrays holding data values, where the dimensionality of a tensor is typically referred to as its rank. We've worked mostly, so far, with tensors of rank zero to two. For instance, a scalar, a single number such as an integer or float, is a tensor of rank 0. A vector is a tensor of rank 1, and a matrix is a tensor of rank 2. But, it doesn't stop here. The tensor notation can be generalized to higher dimensions—as we'll see in the next chapter, when we work with an input of rank 3 and weight tensors of rank 4 to support
...more
With variable scopes, we can organize the variables into separate subparts. When we create a variable scope, the name of operations and tensors that are created within that scope are prefixed with that scope, and those scopes can further be nested.
TensorFlow uses Protocol Buffers (https://developers.google.com/protocol-buffers/), which is a language-agnostic way, for serializing structured data.
A great feature of TensorFlow is TensorBoard, which is a module for visualizing the graph as well as visualizing the learning of a model. Visualizing the graph allows us to see the connection between nodes, explore their dependencies, and debug the model if needed.
Convolutional neural networks, or CNNs, are a family of models that were inspired by how the visual cortex of human brain works when recognizing objects.
Due to the outstanding performance of CNNs for image classification tasks, they have gained a lot of attention and this led to tremendous improvements in machine learning and computer vision applications.
Successfully extracting salient (relevant) features is key to the performance of any machine learning algorithm, of course, and traditional machine learning models rely on input features that may come from a domain expert, or are based on computational feature extraction techniques. Neural networks are able to automatically learn the features from raw data that are most useful for a particular task. For this reason, it's common to consider a neural network as a feature extraction engine: the early layers (those right after the input layer) extract low-level features.
In recent years, another popular regularization technique called dropout has emerged that works amazingly well for regularizing (deep) neural networks
Intuitively, dropout can be considered as the consensus (averaging) of an ensemble of models.
Sequence modeling has many fascinating applications, such as language translation (perhaps from English to German), image captioning, and text generation.
The flow of information in adjacent time steps in the hidden layer allows the network to have a memory of past events. This flow of information is usually displayed as a loop, also known as a recurrent edge in graph notation, which is how this general architecture got its name.

