Jump to ratings and reviews
Rate this book

Transformers for Machine Learning

Rate this book
Transformers are becoming a core part of many neural network architectures, employed in a wide range of applications such as NLP, Speech Recognition, Time Series, and Computer Vision. Transformers have gone through many adaptations and alterations, resulting in newer techniques and methods. Transformers for Machine A Deep Dive is the first comprehensive book on transformers. Key The theoretical explanations of the state-of-the-art transformer architectures will appeal to postgraduate students and researchers (academic and industry) as it will provide a single entry point with deep discussions of a quickly moving field. The practical hands-on case studies and code will appeal to undergraduate students, practitioners, and professionals as it allows for quick experimentation and lowers the barrier to entry into the field.

257 pages, Hardcover

Published May 25, 2022

5 people are currently reading
32 people want to read

About the author

Uday Kamath

8 books3 followers

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
3 (42%)
4 stars
3 (42%)
3 stars
0 (0%)
2 stars
0 (0%)
1 star
1 (14%)
Displaying 1 of 1 review
Profile Image for Joshua Reuben.
18 reviews6 followers
April 18, 2025
imbues a holistic perspective of the theoretical underpinnings of Transformers and Attention (up to 2021).
- clear explanation of how queries, keys and values map to different Transformer architectures
- an understanding of Bertology, whereby combinations of
a. different pre-training regimens,
b. comparative single and parallel multi-sequence dataset encoding,
and c. task-specific loss functions
can be used by fine-tunable BERT family encoders
- my favourite part: the hows and whys of
1. different Transformer architecture block variants (lightweight, connected, adaptive, recurrent and hierarchical)
2. sparse attention and multi-attention complexity optimization technique combinations such as prototype queries, compressed KV memory, low ranked-approximations, biased attention with priors and clustered attention
All compared and contrasted !
Some Transformer architectures are unified and even Turing Complete !
- also modifications for training task efficiency and MoE switch transformers
- on top of that, some very interesting task-specific transformers: the workings of ViT, Graphormer, Decision Transformer for RL, and HuBERT for ASR (this one confused me);
- XAI techniques for Transformers (more complex that I thought it would be) - traits + a taxonomy of explainable methods.

There are some PyTorch code samples available on GitHub at CRCTransformers/deepdive-book (I'm more inclined towards JAX).
Would like to see regular updated editions of this rapidly moving AI field, plus more contrastive side-by-side math to code architecture walkthroughs.

In all, was great, will take me to the next level !
Displaying 1 of 1 review

Can't find what you're looking for?

Get help and learn more about the design.