Tag: Deep Learning

All the articles with the tag "Deep Learning".

Softmax to the Max
Published:Dec 30, 2024 at 10:02 AM in 10 min read
A deep dive into the softmax, logsoftmax, and logsumexp functions, their gradients, and how they relate to each other.
Flash Attention in a Flash
Published:Dec 27, 2024 at 02:49 AM in 8 min read
Flash Attention is a new attention mechanism that can be used to speed up training and inference in large-scale AI models. This article provides an overview of Flash Attention and its applications in AI research and development.
Back to Backprop
Published:Dec 12, 2024 at 06:42 AM in 29 min read
A review of backpropagation, the workhorse of deep learning.
Estimating Transformer Model Properties: A Deep Dive
Published:Dec 7, 2024 at 05:44 AM in 8 min read
In this post, we'll explore how to estimate the size of a Transformer model, including the number of parameters, FLOPs, peak memory footprint, and checkpoint size.
Neural Scaling Laws (Then and Now)
Published:Dec 3, 2024 at 04:07 AM in 12 min read
Scaling laws have been a cornerstone of deep learning research for decades. In this post, we'll explore the history of scaling laws in deep learning, and how they've evolved over time.
Large-Scale Neural Network Training
Published:Dec 2, 2024 at 01:58 AM in 23 min read
Large-scale neural network training is a critical component of modern machine learning workflows. This post provides an overview of the challenges and solutions for training deep learning models at scale.

Softmax to the Max