Of course, the Bayes optimal classifier is an idealization, in that one assumes access to the underlying probability distributions of the data, or our best estimates of such distributions. The NN algorithm functions at the other extreme. All one has is data, and the algorithm makes barely any assumptions about, and indeed has little knowledge of, the underlying distributions. There’s no assumption, for example, that the data follows a Gaussian (bell-shaped) distribution with some mean and variance.

