Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems
Rate it:
1%
Flag icon
Machine Learning is the science (and art) of programming computers so they can learn from data.
2%
Flag icon
Another area where Machine Learning shines is for problems that either are too complex for traditional approaches or have no known algorithm. For example, consider speech recognition:
2%
Flag icon
Finally, Machine Learning can help humans learn (Figure 1-4): ML algorithms can be inspected to see what they have learned
2%
Flag icon
To summarize, Machine Learning is great for: Problems for which existing solutions require a lot of hand-tuning or long lists of rules: one Machine Learning algorithm can often simplify code and perform better. Complex problems for which there is no good solution at all using a traditional approach: the best Machine Learning techniques can find a solution. Fluctuating environments: a Machine Learning system can adapt to new data. Getting insights about complex problems and large amounts of data.
2%
Flag icon
Machine Learning systems can be classified according to the amount and type of supervision they get during training. There are four major categories: supervised learning, unsupervised learning, semisupervised learning, and Reinforcement Learning.
2%
Flag icon
In supervised learning, the training data you feed to the algorithm includes the desired solutions, called labels
2%
Flag icon
A typical supervised learning task is classification.
2%
Flag icon
Another typical task is to predict a target numeric value, such as the price of a car, given a set of features (mileage, age, brand, etc.) called predictors. This sort of task is called regression
2%
Flag icon
In Machine Learning an attribute is a data type (e.g., “Mileage”), while a feature has several meanings depending on the context, but generally means an attribute plus its value
2%
Flag icon
In unsupervised learning, as you might guess, the training data is unlabeled (Figure 1-7). The system tries to learn without a teacher.
2%
Flag icon
If you use a hierarchical clustering algorithm, it may also subdivide each group into smaller groups. This may help you target your posts for each group.
2%
Flag icon
Visualization algorithms are also good examples of unsupervised learning algorithms: you feed them a lot of complex and unlabeled data, and they output a 2D or 3D representation of your data that can easily be plotted
2%
Flag icon
A related task is dimensionality reduction, in which the goal is to simplify the data without losing too much
2%
Flag icon
information.
2%
Flag icon
Yet another important unsupervised task is anomaly detection — for example, detecting unusual credit card transactions to prevent fraud, catching manufacturing defects, or automatically removing outliers from a dataset before feeding it to another learning algorithm. The system is trained with normal instances, and when it sees a new instance it can tell whether it looks like a normal one or whether it is likely an anomaly
2%
Flag icon
Finally, another common unsupervised task is association rule learning, in which the goal is to dig into large amounts of data and discover interesting relations between attributes.
2%
Flag icon
Most semisupervised learning algorithms are combinations of unsupervised and supervised algorithms.
2%
Flag icon
Reinforcement Learning is a very different beast. The learning system, called an agent in this context, can observe the environment, select and perform actions, and get rewards in return (or penalties in the form of negative rewards, as in Figure 1-12).
2%
Flag icon
Another criterion used to classify Machine Learning systems is whether or not the system can learn incrementally from a stream of incoming data.
2%
Flag icon
In batch learning, the system is incapable of learning incrementally: it must be trained using all the available data. This will generally take a lot of time and computing resources, so it is typically done offline.
3%
Flag icon
In online learning, you train the system incrementally by feeding it data instances sequentially, either individually or by small groups called mini-batches. Each learning step is fast and cheap, so the system can learn about new data on the fly, as it arrives
3%
Flag icon
Online learning algorithms can also be used to train systems on huge datasets that cannot fit in one machine’s main
3%
Flag icon
memory (this is called out-of-core learning).
3%
Flag icon
One important parameter of online learning systems is how fast they should adapt to changing data: this is called the learning rate.
3%
Flag icon
One more way to categorize Machine Learning systems is by how they generalize.
3%
Flag icon
There are two main approaches to generalization: instance-based learning and model-based learning.
3%
Flag icon
This is called instance-based learning: the system learns the examples by heart, then generalizes to new cases using a similarity measure
3%
Flag icon
Another way to generalize from a set of examples is to build a model of these examples, then use that model to make predictions. This is called model-based learning
3%
Flag icon
For linear regression problems, people typically use a cost function that measures the distance between the linear model’s predictions and the training examples; the objective is to minimize this distance. This is where the Linear Regression algorithm comes in: you feed it your training examples and it finds the parameters that make the linear model fit best to your data. This is called training the model.
4%
Flag icon
Overgeneralizing is something that we humans do all too often, and unfortunately machines can fall into the same trap if we are not careful. In Machine Learning this is called overfitting: it means that the model performs well on the training data, but it does not generalize well.
4%
Flag icon
The amount of regularization to apply during learning can be controlled by a hyperparameter. A hyperparameter is a
4%
Flag icon
parameter of a learning algorithm (not of the model).
4%
Flag icon
underfitting is the opposite of overfitting: it occurs when your model is too simple to learn the underlying structure of the data.
4%
Flag icon
To avoid “wasting” too much training data in validation sets, a common technique is to use cross-validation: the training set is split into complementary subsets, and each model is trained against a different combination of these subsets and validated against the remaining parts. Once the model type and hyperparameters have been selected, a final model is trained using these hyperparameters on the full training set, and the generalized error is measured on the test set.
9%
Flag icon
Estimators. Any object that can estimate some parameters based on a dataset is called an estimator
9%
Flag icon
The estimation itself is performed by the fit() method, and it takes only a dataset as a parameter (or two for supervised learning algorithms; the second dataset contains the labels). Any other parameter needed to guide the estimation process is considered a hyperparameter (such as an imputer’s strategy), and it must be set as an instance variable
9%
Flag icon
Transformers. Some estimators (such as an imputer) can also transform a dataset; thes...
This highlight has been truncated due to consecutive passage length restrictions.
9%
Flag icon
the API is quite simple: the transformation is performed by the transform() method with the dataset to transform as a parameter....
This highlight has been truncated due to consecutive passage length restrictions.
9%
Flag icon
Predictors. Finally, some estimators are capable of making predictions given a dataset; they are called predictors.
11%
Flag icon
so a typical prediction error of $68,628 is not very satisfying.
Kenneth R. Lewis, Jr.
The lower the RMSE (error) the better
13%
Flag icon
the most common supervised learning tasks are regression (predicting values) and classification (predicting classes).
14%
Flag icon
To compute the confusion matrix, you first need to have a set of predictions, so they can be compared to the actual targets.
14%
Flag icon
cross_val_predict() performs K-fold cross-validation, but instead of returning the evaluation scores, it returns the predictions made on each test fold.
14%
Flag icon
Each row in a confusion matrix represents an actual class, while each column represents a predicted class.
14%
Flag icon
A perfect classifier would have only true positives and true negatives, so its confusion matrix would have nonzero values only on its main diagonal (top left to bottom right):
14%
Flag icon
The confusion matrix gives you a lot of information, but sometimes you may prefer a more concise metric. An interesting one to look at is the accuracy of the positive predictions; this is called the precision of the classifier (Equation 3-1).
14%
Flag icon
TP is the number of true positives, and FP is the number of false positives.
14%
Flag icon
precision is typically used along with another metric named recall, also called sensitivity or true positive rate (TPR): this is the ratio of positive instances that are correctly detected by the classifier (Equation 3-2).
14%
Flag icon
FN is of course the number of false negatives.
14%
Flag icon
To understand this tradeoff, let’s look at how the SGDClassifier makes its classification decisions. For each instance, it computes a score based on a decision function, and if that score is greater than a threshold, it assigns the instance to the positive class, or else it assigns it to the negative class.
« Prev 1 3