Transformers are becoming a core part of many neural network architectures, employed in a wide range of applications such as NLP, Speech Recognition, Time Series, and Computer Vision. Transformers have gone through many adaptations and alterations, resulting in newer techniques and methods. Transformers for Machine A Deep Dive is the first comprehensive book on transformers. Key The theoretical explanations of the state-of-the-art transformer architectures will appeal to postgraduate students and researchers (academic and industry) as it will provide a single entry point with deep discussions of a quickly moving field. The practical hands-on case studies and code will appeal to undergraduate students, practitioners, and professionals as it allows for quick experimentation and lowers the barrier to entry into the field.
imbues a holistic perspective of the theoretical underpinnings of Transformers and Attention (up to 2021). - clear explanation of how queries, keys and values map to different Transformer architectures - an understanding of Bertology, whereby combinations of a. different pre-training regimens, b. comparative single and parallel multi-sequence dataset encoding, and c. task-specific loss functions can be used by fine-tunable BERT family encoders - my favourite part: the hows and whys of 1. different Transformer architecture block variants (lightweight, connected, adaptive, recurrent and hierarchical) 2. sparse attention and multi-attention complexity optimization technique combinations such as prototype queries, compressed KV memory, low ranked-approximations, biased attention with priors and clustered attention All compared and contrasted ! Some Transformer architectures are unified and even Turing Complete ! - also modifications for training task efficiency and MoE switch transformers - on top of that, some very interesting task-specific transformers: the workings of ViT, Graphormer, Decision Transformer for RL, and HuBERT for ASR (this one confused me); - XAI techniques for Transformers (more complex that I thought it would be) - traits + a taxonomy of explainable methods.
There are some PyTorch code samples available on GitHub at CRCTransformers/deepdive-book (I'm more inclined towards JAX). Would like to see regular updated editions of this rapidly moving AI field, plus more contrastive side-by-side math to code architecture walkthroughs.
In all, was great, will take me to the next level !