Tag: Transformers

All the articles with the tag "Transformers".

Flash Attention in a Flash
Published:Dec 27, 2024 at 02:49 AM in 8 min read
Flash Attention is a new attention mechanism that can be used to speed up training and inference in large-scale AI models. This article provides an overview of Flash Attention and its applications in AI research and development.
Estimating Transformer Model Properties: A Deep Dive
Published:Dec 7, 2024 at 05:44 AM in 8 min read
In this post, we'll explore how to estimate the size of a Transformer model, including the number of parameters, FLOPs, peak memory footprint, and checkpoint size.
All you need (to know) about attention
Published:Nov 4, 2024 at 02:35 AM in 35 min read
A highly compressed guide to quickly refresh (mostly) all you need to know about attention mechanisms in deep learning.

Flash Attention in a Flash