Tag: Transformers
All the articles with the tag "Transformers".
Flash Attention in a Flash
Published: at 02:49 AM in 8 min readFlash Attention is a new attention mechanism that can be used to speed up training and inference in large-scale AI models. This article provides an overview of Flash Attention and its applications in AI research and development.
Estimating Transformer Model Properties: A Deep Dive
Published: at 05:44 AM in 8 min readIn this post, we'll explore how to estimate the size of a Transformer model, including the number of parameters, FLOPs, peak memory footprint, and checkpoint size.
All you need (to know) about attention
Published: at 02:35 AM in 34 min readA highly compressed quide to quickly refresh (mostly) all you need to know about attention mechanisms in deep learning.