Jump to ratings and reviews
Rate this book

Transformers for Machine Learning: A Deep Dive

Rate this book
Transformers are becoming a core part of many neural network architectures, employed in a wide range of applications such as NLP, Speech Recognition, Time Series, and Computer Vision. Transformers have gone through many adaptations and alterations, resulting in newer techniques and methods. Transformers for Machine A Deep Dive is the first comprehensive book on transformers.

Key

A comprehensive reference book for detailed explanations for every algorithm and techniques related to the transformers. 60+ transformer architectures covered in a comprehensive manner. A book for understanding how to apply the transformer techniques in speech, text, time series, and computer vision. Practical tips and tricks for each architecture and how to use it in the real world. Hands-on case studies and code snippets for theory and practical real-world analysis using the tools and libraries, all ready to run in Google Colab. The theoretical explanations of the state-of-the-art transformer architectures will appeal to postgraduate students and researchers (academic and industry) as it will provide a single entry point with deep discussions of a quickly moving field. The practical hands-on case studies and code will appeal to undergraduate students, practitioners, and professionals as it allows for quick experimentation and lowers the barrier to entry into the field.

283 pages, Kindle Edition

Published May 24, 2022

5 people are currently reading
32 people want to read

About the author

Uday Kamath

8 books3 followers

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
3 (42%)
4 stars
3 (42%)
3 stars
0 (0%)
2 stars
0 (0%)
1 star
1 (14%)
Displaying 1 of 1 review
Profile Image for Joshua Reuben.
18 reviews6 followers
April 18, 2025
imbues a holistic perspective of the theoretical underpinnings of Transformers and Attention (up to 2021).
- clear explanation of how queries, keys and values map to different Transformer architectures
- an understanding of Bertology, whereby combinations of
a. different pre-training regimens,
b. comparative single and parallel multi-sequence dataset encoding,
and c. task-specific loss functions
can be used by fine-tunable BERT family encoders
- my favourite part: the hows and whys of
1. different Transformer architecture block variants (lightweight, connected, adaptive, recurrent and hierarchical)
2. sparse attention and multi-attention complexity optimization technique combinations such as prototype queries, compressed KV memory, low ranked-approximations, biased attention with priors and clustered attention
All compared and contrasted !
Some Transformer architectures are unified and even Turing Complete !
- also modifications for training task efficiency and MoE switch transformers
- on top of that, some very interesting task-specific transformers: the workings of ViT, Graphormer, Decision Transformer for RL, and HuBERT for ASR (this one confused me);
- XAI techniques for Transformers (more complex that I thought it would be) - traits + a taxonomy of explainable methods.

There are some PyTorch code samples available on GitHub at CRCTransformers/deepdive-book (I'm more inclined towards JAX).
Would like to see regular updated editions of this rapidly moving AI field, plus more contrastive side-by-side math to code architecture walkthroughs.

In all, was great, will take me to the next level !
Displaying 1 of 1 review

Can't find what you're looking for?

Get help and learn more about the design.