Mark Gerstein

22%
Flag icon
In the first method, given data, the ML algorithm figures out the best θ, for some choice of distribution type (Bernoulli or Gaussian or something else), which maximizes the likelihood of seeing the data, D. In other words, you are estimating the best underlying distribution, with parameter θ, such that if you were to sample from that distribution, you would maximize the likelihood of observing the labeled data you already had in hand. Not surprisingly, this method is called maximum likelihood estimation (MLE). It maximizes P (D | θ), the probability of observing D given θ, and is loosely ...more
This highlight has been truncated due to consecutive passage length restrictions.
Why Machines Learn: The Elegant Math Behind Modern AI
Rate this book
Clear rating
Open Preview