Machine Learning
Rate it:
Read between December 18 - December 26, 2017
11%
Flag icon
Data starts to drive the operation; it is not the programmers anymore but the data itself that defines what to do next.
11%
Flag icon
Before, data was what the programs processed and spit out—data was passive. With this question, data starts to drive the operation; it is not the programmers anymore but the data itself that defines what to do next.
11%
Flag icon
customer behavior is not completely random. People do not go to supermarkets and buy things at random. When they buy beer, they buy chips; they buy ice cream in summer and spices for Glühwein in winter. There are certain patterns in customer behavior, and that is where data comes into play.
11%
Flag icon
This is called data mining. The analogy is that a large volume of earth and raw material is extracted from the mine, which when processed
12%
Flag icon
Data mining is one type of machine learning. We do not know the rules (of customer behavior), so we cannot write the program, but the machine—that is, the computer—“learns” by extracting such rules from (customer transaction) data.
12%
Flag icon
Learning models are used in pattern recognition,
12%
Flag icon
we use learning algorithms to make sense of the bigger and bigger data.
12%
Flag icon
Learning versus Programming
12%
Flag icon
An algorithm is a sequence of instructions that are carried out to transform the input to the output.
13%
Flag icon
we would like the computer (the machine) to extract automatically the algorithm for this task.
13%
Flag icon
Artificial Intelligence
13%
Flag icon
A system that is in a changing environment should have the ability to learn; otherwise, we would hardly call it intelligent. If the system can learn and adapt to such changes, the system designer need not foresee and provide solutions for all possible situations.
13%
Flag icon
Each of us, actually every animal, is a data scientist. We collect data from our sensors, and then we process the data to get abstract rules to perceive our environment and control our actions in that environment to minimize pain and/or maximize pleasure. We have memory to store those rules in our brains, and then we recall and use them when needed. Learning is lifelong; we forget rules when they no longer apply or revise them when the environment changes.
14%
Flag icon
Whereas a computer generally has one or few processors, the brain is composed of a very large number of processing units, namely, neurons, operating in parallel. Though the details are not completely known, the processing units are believed to be much simpler and slower than a typical processor in a computer.
15%
Flag icon
Just as the initial attempts to build flying machines looked a lot like birds until we discovered the theory of aerodynamics, it is also expected that the first attempts to build structures possessing the brain’s abilities will look like the brain with networks of large numbers of processing units.
15%
Flag icon
Pattern Recognition
16%
Flag icon
we should keep in mind that just because we have a lot of data, it does not mean that there are underlying rules that can be learned. We should make sure that there are dependencies in the underlying process and that the collected data provides enough information for them to be learned with acceptable accuracy.
17%
Flag icon
The main theory underlying machine learning comes from statistics, where going from particular observations to general descriptions is called inference and learning is called estimation
Julian M Drault
What was once called STATISTICS now re-baptized nas Machine Learning
18%
Flag icon
Machine Learning, Statistics, and Data Analytics
18%
Flag icon
we use machine learning when we believe there is a relationship between observations of interest but do not know exactly how.
19%
Flag icon
no matter how many properties we list as input, there are always other factors that affect the output; we cannot possibly record and take all of them as input, and all these other factors that we neglect introduce uncertainty.
20%
Flag icon
expect customers in general to follow certain patterns in their purchases depending on factors such as the composition of their household, their tastes, their income, and so on. Still, there are always additional random factors that introduce variance: vacation, change in weather, some catchy advertisement, and so on.
20%
Flag icon
Model Selection
20%
Flag icon
if we believe that we can write the output as a weighed sum of the attributes, we can use a linear model where attributes have an additive effect—for example, each additional seat increases the value of the car by X dollars and each additional thousand miles driven decreases the value by Y dollars, and so on.
21%
Flag icon
If a weight is estimated to be very close to zero, we can conclude that the corresponding attribute is not important and eliminate it from the model. These weights are the parameters of the model and are fine-tuned using data. The model is always fixed; it is the parameters that are adjustable, and it is this process of adjustment to better match the data that we call learning.
21%
Flag icon
Supervised Learning
21%
Flag icon
Each model corresponds to a certain type of dependency assumption between the inputs and the output.
21%
Flag icon
Learning corresponds to adjusting the parameters so that the model makes the most accurate predictions on the data.
23%
Flag icon
0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55 You probably noticed that this is the Fibonacci sequence.
23%
Flag icon
In philosophy, Occam’s razor tells us to prefer simpler explanations, eliminating unnecessary complexity.
23%
Flag icon
Human behavior is sometimes as much Dionysian as it is Apollonian.
24%
Flag icon
Classification is another type of supervised learning where the output is a class code, as opposed to the numeric value we have in regression.
26%
Flag icon
Expert Systems
26%
Flag icon
Another way to represent uncertainty is to use probability theory, as we do in this book.
28%
Flag icon
Pattern Recognition
29%
Flag icon
In machine learning, the aim is to fit a model to the data.
31%
Flag icon
diagnostics is the inference of hidden factors from observed variables.
31%
Flag icon
conditional probability
31%
Flag icon
Bayes’ rule,
31%
Flag icon
Face Recognition
34%
Flag icon
Outlier Detection
34%
Flag icon
find instances that do not obey the general rule—those
35%
Flag icon
Dimensionality Reduction
35%
Flag icon
when an input is deemed unnecessary, we save the cost of measuring it.
35%
Flag icon
simpler models are more robust on small data sets; that is, they can be trained with fewer data; or when trained with the same amount of data, they have smaller variance (uncertainty).
35%
Flag icon
when data can be explained with fewer features, we have a simpler model that is easier to interpret.
35%
Flag icon
if we can find a good way to display the data, our visual cortex can do the rest, without any need for model fitting calculation.
36%
Flag icon
Decision Trees
36%
Flag icon
if-then rules
37%
Flag icon
Trees are used successfully in various machine learning applications, and together with the linear model, the decision tree should be taken as one of the basic benchmark methods before any more complex learning algorithm is tried.
« Prev 1