Deep learning uses layers of recognizers. Before you can recognize a dog, you have to be able to recognize shapes. Before you can recognize shapes, you have to be able to recognize edges, so that you can distinguish a shape from its background. These successive stages of recognition each produce a compressed mathematical representation that is passed up to the next layer. Getting the compression right is key. If you try to compress too much, you can’t represent the richness of what is going on, and you get errors. If you try to compress too little, the network will memorize the training
  
  ...more




