In 2017, in an attempt to improve machine translation, a group at Google published a new NN model for MT. They called it the transformer model, and the paper was titled “Attention Is All You Need” [104]; see figure 10.1. While transformers were initially created to improve machine translation, they were quickly adapted to language modeling, and that is where they have had the largest impact. Thus, we present the simplified version that was adopted early on by the group at OpenAI for LMs.

