Machine Learning
Rate it:
Read between December 18 - December 26, 2017
37%
Flag icon
Generally more data means more information, and hence more data tends to decrease uncertainty.
38%
Flag icon
Bayesian Methods
39%
Flag icon
The Bayesian approach allows us to incorporate our prior beliefs in training.
39%
Flag icon
Bayesian estimation is especially interesting because we are no longer constrained by some parametric model class, but the model complexity also changes dynamically to match the complexity of the task in the data
40%
Flag icon
Bayesian estimation uses Bayes’ rule in probability theory (which we saw before) named after Thomas Bayes (1702–1761)
40%
Flag icon
Artificial Neural Networks
40%
Flag icon
Our brains make us intelligent; we see or hear, learn and remember, plan and act thanks to our brains. In trying to build machines to have such abilities then, our immediate source of inspiration is the human brain, just as birds were the source of inspiration in our early attempts to fly.
44%
Flag icon
The basic idea in connectionist models is that intelligence is an emergent property and high-level tasks, such as recognition or association between patterns, arise automatically as a result of this activity propagation by the rather elemental operations of interconnected simple processing units. Similarly, learning is done at the connection level through simple operations, for instance, according to the Hebbian rule, without any need for a higher-level programmer.
44%
Flag icon
Neural Networks as a Paradigm for Parallel Processing
45%
Flag icon
Since the 1980s, computer systems with thousands of processors have been commercially available. The software for such parallel architectures, however, has not advanced as quickly as hardware. The reason for this is that almost all our theory of computation up to that point was based on serial, single-processor machines. We are not able to use the parallel machines in their full capacity because we cannot program them efficiently.
45%
Flag icon
Thus, artificial neural networks are a way to make use of the parallel hardware we can build with current technology and—thanks to learning—they need not be programmed.
47%
Flag icon
This multilayered network is an example of a hierarchical cone where features get more complex, abstract, and fewer in number as we go up the network until we get to classes
47%
Flag icon
Deep Learning
47%
Flag icon
Though these approaches have had some success, learning algorithms are achieving higher accuracy recently with big data and powerful computers. With few assumptions and little manual interference, structures similar to the hierarchical cone are being automatically learned from large amounts of data. These learning approaches are especially interesting in that, because they learn, they are not fixed for any specific task, and they can be used in a variety of applications.
48%
Flag icon
Successive layers correspond to more abstract representations until we get to the final layer where the outputs are learned in terms of these most abstract concepts. We saw an example of this in the convolutional neural network where starting from pixels, we get to edges, and then to corners, and so on, until we get to a digit.
48%
Flag icon
deep learning, the idea is to learn feature levels of increasing abstraction with minimum human contribution
48%
Flag icon
This is because in most applications, we do not know what structure there is in the input, especially as we go up,
49%
Flag icon
Deep learning methods are attractive mainly because they need less manual interference. We do not need to craft the right features or the suitable transformations. Once we have data—and nowadays we have “big” data—and sufficient computation available—and nowadays we have data centers with thousands of processors—we just wait and let the learning algorithm discover all that is necessary by itself.
49%
Flag icon
The idea of multiple layers of increasing abstraction that underlies deep learning is intuitive.
50%
Flag icon
One method for unsupervised learning is clustering, where the aim is to find clusters or groupings of input.
50%
Flag icon
a clustering model allocates customers similar in their attributes to the same group, providing the company with natural groupings of its
52%
Flag icon
The aim in clustering in particular, or unsupervised learning in general, is to find structure in the data. In the case of supervised learning (for example, in classification), this structure is imposed by the supervisor who defines the different classes and labels the instances in the training data by these classes.
53%
Flag icon
instead of learning association rules between pairs or triples of these items, if we can estimate the hidden baby factor based on past purchases, this will trigger an estimation of whatever it is that has not been bought yet.
64%
Flag icon
a data-driven analysis, as is done by machine learning algorithms. We can use any of the methods we discussed in previous chapters, for classification, regression, clustering, and so on, to build a model from the data.
65%
Flag icon
visualization is one of the best tools for data analysis, and sometimes just visualizing the data in a smart way is enough to understand the characteristics of a process that underlies a complicated data set, without any need for further complex and costly statistical processing;
66%
Flag icon
Data Science
67%
Flag icon
Most of the time data does not obey the parametric assumptions, such as the bell-shaped Gaussian curve, that we use in statistics to make estimation easier. Instead, with the new data, we need to resort to more flexible nonparametric models whose complexity can adjust automatically to the complexity of the task underlying the data. All these requirements make machine learning more challenging than statistics as we used to know and practice it.
67%
Flag icon
in real-world applications how efficiently the data is stored and manipulated may be as critical as the prediction accuracy.
67%
Flag icon
One important point is that intelligence is a vague term and its applicability to assess the performance of computer systems may be misleading. For example, evaluating computers on tasks that are difficult for humans, such as playing chess, is not a good idea for assessing their intelligence.
67%
Flag icon
For a computer, it is much more difficult to recognize the face of its opponent than to play chess.
67%
Flag icon
In real life, all sorts of randomness occurs, and for its survival every species is slowly evolving to be a better cheater than the rest.
68%
Flag icon
this makes trained software less predictable than programmed software.
68%
Flag icon
there is an important risk in basing recommendations too much on past use and preferences. If a person only listens to songs similar to the ones they listened to and enjoyed before, or watches movies similar to those they watched and enjoyed before, or reads books similar to the books they read and enjoyed before, then there will be no new experience
69%
Flag icon
Current deep networks are not deep enough; they can learn enough abstraction in some limited context to recognize handwritten digits or a subset of objects, but they are far from having the capability of our visual cortex to recognize a scene.
70%
Flag icon
Bayesian estimation A method for parameter estimation where we use not only the sample, but also the prior information about the unknown parameters given by a prior distribution.
71%
Flag icon
Data mining Machine learning and statistical methods for extracting information from large amounts of data. For example, in basket analysis, by analyzing large number of transactions, we find association rules.
71%
Flag icon
Data warehouse A subset of data selected, extracted, and organized for a specific data analysis task. The original data may be very detailed and may lie in several different operational databases. The warehouse merges and summarizes them. The warehouse is read-only; it is used to get a high-level overview of the process that underlies the data either through OLAP and visualization tools, or by data mining software.
72%
Flag icon
Deep learning Methods that are used to train models with several levels of abstraction from the raw input to the output. For example, in visual recognition, the lowest level is an image composed of pixels. In layers as we go up, a deep learner combines them to form strokes and edges of different orientations, which can then be combined to detect longer lines, arcs, corners, and junctions, which in turn can be combined to form rectangles, circles, and so on. The units of each layer may be thought of as a set of primitives at a different level of abstraction.
73%
Flag icon
If-then rules
73%
Flag icon
A model that can be written as a set of if-then rules is easy to understand and hence rule bases allow knowledge extraction.
74%
Flag icon
Neural network A model composed of a network of simple processing units called neurons and connections between neurons called synapses. Each synapse has a direction and a weight, and the weight defines the effect of the neuron before on the neuron after.
74%
Flag icon
Occam’s razor A philosophical heuristic that advises us to prefer simple explanations to complicated ones.
74%
Flag icon
Online analytical processing (OLAP) Data analysis software used to extract information from a data warehouse. OLAP is user-driven, in the sense that the user thinks of some hypotheses about the process and using OLAP tools checks whether the data supports those hypotheses. Machine learning is more data-driven in the sense that automatic data analysis may find dependencies not previously thought by users.
« Prev 1 2 Next »