Posts
All the articles I've posted.
AutoDiff Puzzles
Published: at 05:32 AM in 19 min readMy solutions to srush's AutoDiff Puzzles. This is useful as a quick refresher for computing gradients.
Back to Backprop
Published: at 06:42 AM in 29 min readA review of backpropagation, the workhorse of deep learning.
Estimating Transformer Model Properties: A Deep Dive
Published: at 05:44 AM in 8 min readIn this post, we'll explore how to estimate the size of a Transformer model, including the number of parameters, FLOPs, peak memory footprint, and checkpoint size.
Neural Scaling Laws (Then and Now)
Published: at 04:07 AM in 12 min readScaling laws have been a cornerstone of deep learning research for decades. In this post, we'll explore the history of scaling laws in deep learning, and how they've evolved over time.
Feeding the Beast - Data Loading Secrets for Hungry Neural Networks
Published: at 02:53 AM in 11 min readData loading is a critical part of training deep learning models. In this post, we'll explore the best practices for loading data into neural networks, with a focus on PyTorch.
Large-Scale Neural Network Training
Published: at 01:58 AM in 23 min readLarge-scale neural network training is a critical component of modern machine learning workflows. This post provides an overview of the challenges and solutions for training deep learning models at scale.