Posts
All the articles I've posted.
GPU Puzzles
Published: at 05:32 AM in 10 min readMy solutions to srush's GPU Puzzles
Computing the Jacobian of a Matrix Product
Published: at 03:31 AM in 6 min readA step-by-step illustration of how you can visualize and derive the Jacobian of a matrix product, using a concrete example.
AutoDiff Puzzles
Published: at 05:32 AM in 19 min readMy solutions to srush's AutoDiff Puzzles. This is useful as a quick refresher for computing gradients.
Back to Backprop
Published: at 06:42 AM in 29 min readA review of backpropagation, the workhorse of deep learning.
Estimating Transformer Model Properties: A Deep Dive
Published: at 05:44 AM in 8 min readIn this post, we'll explore how to estimate the size of a Transformer model, including the number of parameters, FLOPs, peak memory footprint, and checkpoint size.
Neural Scaling Laws (Then and Now)
Published: at 04:07 AM in 12 min readScaling laws have been a cornerstone of deep learning research for decades. In this post, we'll explore the history of scaling laws in deep learning, and how they've evolved over time.