Name: Transformers for Machine Learning (Chapman & Hall/CRC Machine Learning & Pattern Recognition)
Rating: 4 (1 reviews)
ISBN: 9780367771652

Joshua Reuben

18 reviews6 followers

April 18, 2025

imbues a holistic perspective of the theoretical underpinnings of Transformers and Attention (up to 2021).
- clear explanation of how queries, keys and values map to different Transformer architectures
- an understanding of Bertology, whereby combinations of
a. different pre-training regimens,
b. comparative single and parallel multi-sequence dataset encoding,
and c. task-specific loss functions
can be used by fine-tunable BERT family encoders
- my favourite part: the hows and whys of
1. different Transformer architecture block variants (lightweight, connected, adaptive, recurrent and hierarchical)
2. sparse attention and multi-attention complexity optimization technique combinations such as prototype queries, compressed KV memory, low ranked-approximations, biased attention with priors and clustered attention
All compared and contrasted !
Some Transformer architectures are unified and even Turing Complete !
- also modifications for training task efficiency and MoE switch transformers
- on top of that, some very interesting task-specific transformers: the workings of ViT, Graphormer, Decision Transformer for RL, and HuBERT for ASR (this one confused me);
- XAI techniques for Transformers (more complex that I thought it would be) - traits + a taxonomy of explainable methods.

There are some PyTorch code samples available on GitHub at CRCTransformers/deepdive-book (I'm more inclined towards JAX).
Would like to see regular updated editions of this rapidly moving AI field, plus more contrastive side-by-side math to code architecture walkthroughs.

In all, was great, will take me to the next level !

tech

Transformers for Machine Learning

Uday Kamath, Kenneth Graham, Wael Emara

About the author

Uday Kamath

Ratings & Reviews

Friends & Following

Community Reviews

Join the discussion

Can't find what you're looking for?