Tag: Deep Learning
All the articles with the tag "Deep Learning".
Softmax to the Max
Published: at 10:02 AM in 10 min readA deep dive into the softmax, logsoftmax, and logsumexp functions, their gradients, and how they relate to each other.
Flash Attention in a Flash
Published: at 02:49 AM in 8 min readFlash Attention is a new attention mechanism that can be used to speed up training and inference in large-scale AI models. This article provides an overview of Flash Attention and its applications in AI research and development.
Back to Backprop
Published: at 06:42 AM in 29 min readA review of backpropagation, the workhorse of deep learning.
Estimating Transformer Model Properties: A Deep Dive
Published: at 05:44 AM in 8 min readIn this post, we'll explore how to estimate the size of a Transformer model, including the number of parameters, FLOPs, peak memory footprint, and checkpoint size.
Neural Scaling Laws (Then and Now)
Published: at 04:07 AM in 12 min readScaling laws have been a cornerstone of deep learning research for decades. In this post, we'll explore the history of scaling laws in deep learning, and how they've evolved over time.
Large-Scale Neural Network Training
Published: at 01:58 AM in 23 min readLarge-scale neural network training is a critical component of modern machine learning workflows. This post provides an overview of the challenges and solutions for training deep learning models at scale.