Archives
All the articles I've archived.
Python and Pandas and P-values, Oh My! A Statistical Journey
Published: at 01:15 AM in 26 min readA rapid refresher on the basics of statistics and how to apply them in Python.
A Straight Line Through Linear Algebra
Published: at 07:27 PM in 9 min readA brief review of core concepts in linear algebra, with a focus on applications in machine learning.
Pandas Primer
Published: at 01:15 AM in 5 min readA rapid review of the Pandas library for the impatient data scientist.
Feeding the Beast - Data Loading Secrets for Hungry Neural Networks
Published: at 02:53 AM in 11 min readData loading is a critical part of training deep learning models. In this post, we'll explore the best practices for loading data into neural networks, with a focus on PyTorch.
Flash Attention in a Flash
Published: at 02:49 AM in 8 min readFlash Attention is a new attention mechanism that can be used to speed up training and inference in large-scale AI models. This article provides an overview of Flash Attention and its applications in AI research and development.
Estimating Transformer Model Properties: A Deep Dive
Published: at 05:44 AM in 8 min readIn this post, we'll explore how to estimate the size of a Transformer model, including the number of parameters, FLOPs, peak memory footprint, and checkpoint size.
AutoDiff Puzzles
Published: at 05:32 AM in 19 min readMy solutions to srush's AutoDiff Puzzles. This is useful as a quick refresher for computing gradients.
Back to Backprop
Published: at 06:42 AM in 29 min readA review of backpropagation, the workhorse of deep learning.
Back to the Basics
Published: at 09:32 AM in 1 min readA refresher on the fundamentals of computer science.
GPU Puzzles
Published: at 05:32 AM in 10 min readMy solutions to srush's GPU Puzzles
Large-Scale Neural Network Training
Published: at 01:58 AM in 23 min readLarge-scale neural network training is a critical component of modern machine learning workflows. This post provides an overview of the challenges and solutions for training deep learning models at scale.
Computing the Jacobian of a Matrix Product
Published: at 03:31 AM in 6 min readA step-by-step illustration of how you can visualize and derive the Jacobian of a matrix product, using a concrete example.
Neural Scaling Laws (Then and Now)
Published: at 04:07 AM in 12 min readScaling laws have been a cornerstone of deep learning research for decades. In this post, we'll explore the history of scaling laws in deep learning, and how they've evolved over time.
Softmax to the Max
Published: at 10:02 AM in 10 min readA deep dive into the softmax, logsoftmax, and logsumexp functions, their gradients, and how they relate to each other.
Can AI Write about AI Personality With Personality?
Published: at 07:33 AM in 31 min readShould AI have "Personality" or will "Bland and Mid" suffice?
All you need (to know) about attention
Published: at 02:35 AM in 34 min readA highly compressed quide to quickly refresh (mostly) all you need to know about attention mechanisms in deep learning.
o1 and Reasoning
Published: at 12:53 AM in 62 min readOrganizing research presumably related to OpenAI's o1 reasoning model.
Byte Pair Encoding Tokenizer
Published: at 01:37 AM in 6 min readAn in-depth exploration of the Byte Pair Encoding (BPE) tokenizer, its mechanics, and its applications in natural language processing.
Automating App Deployment on a VPS with GitHub Actions
Published: at 03:47 AM in 8 min readLearn how to automate your VPS application deployment process using GitHub Actions for a seamless CI/CD experience.
Tensor Puzzles
Published: at 05:32 AM in 13 min readMy solutions to srush’s tensor puzzles. This is useful as a quick refresher for things you can do with vanilla tensor operations.