For comparison, AlexNet debuted with a network of sixty million parameters—just enough to make reasonable sense of the ImageNet data set, at least in part—while transformers big enough to be trained on a world of text, photos, video, and more are growing well into hundreds of billions of parameters.

