Skip to content

Archives

All the articles I've archived.

2025 3
January 3
2024 17
December 11
  • Feeding the Beast - Data Loading Secrets for Hungry Neural Networks

    Published: at 02:53 AM in 11 min read

    Data loading is a critical part of training deep learning models. In this post, we'll explore the best practices for loading data into neural networks, with a focus on PyTorch.

  • Flash Attention in a Flash

    Published: at 02:49 AM in 8 min read

    Flash Attention is a new attention mechanism that can be used to speed up training and inference in large-scale AI models. This article provides an overview of Flash Attention and its applications in AI research and development.

  • Estimating Transformer Model Properties: A Deep Dive

    Published: at 05:44 AM in 8 min read

    In this post, we'll explore how to estimate the size of a Transformer model, including the number of parameters, FLOPs, peak memory footprint, and checkpoint size.

  • AutoDiff Puzzles

    Published: at 05:32 AM in 19 min read

    My solutions to srush's AutoDiff Puzzles. This is useful as a quick refresher for computing gradients.

  • Back to Backprop

    Published: at 06:42 AM in 29 min read

    A review of backpropagation, the workhorse of deep learning.

  • Back to the Basics

    Published: at 09:32 AM in 1 min read

    A refresher on the fundamentals of computer science.

  • GPU Puzzles

    Published: at 05:32 AM in 10 min read

    My solutions to srush's GPU Puzzles

  • Large-Scale Neural Network Training

    Published: at 01:58 AM in 23 min read

    Large-scale neural network training is a critical component of modern machine learning workflows. This post provides an overview of the challenges and solutions for training deep learning models at scale.

  • Computing the Jacobian of a Matrix Product

    Published: at 03:31 AM in 6 min read

    A step-by-step illustration of how you can visualize and derive the Jacobian of a matrix product, using a concrete example.

  • Neural Scaling Laws (Then and Now)

    Published: at 04:07 AM in 12 min read

    Scaling laws have been a cornerstone of deep learning research for decades. In this post, we'll explore the history of scaling laws in deep learning, and how they've evolved over time.

  • Softmax to the Max

    Published: at 10:02 AM in 10 min read

    A deep dive into the softmax, logsoftmax, and logsumexp functions, their gradients, and how they relate to each other.

November 3
  • Can AI Write about AI Personality With Personality?

    Published: at 07:33 AM in 31 min read

    Should AI have "Personality" or will "Bland and Mid" suffice?

  • All you need (to know) about attention

    Published: at 02:35 AM in 34 min read

    A highly compressed quide to quickly refresh (mostly) all you need to know about attention mechanisms in deep learning.

  • o1 and Reasoning

    Published: at 12:53 AM in 62 min read

    Organizing research presumably related to OpenAI's o1 reasoning model.

October 3
  • Byte Pair Encoding Tokenizer

    Published: at 01:37 AM in 6 min read

    An in-depth exploration of the Byte Pair Encoding (BPE) tokenizer, its mechanics, and its applications in natural language processing.

  • Automating App Deployment on a VPS with GitHub Actions

    Published: at 03:47 AM in 8 min read

    Learn how to automate your VPS application deployment process using GitHub Actions for a seamless CI/CD experience.

  • Tensor Puzzles

    Published: at 05:32 AM in 13 min read

    My solutions to srush’s tensor puzzles. This is useful as a quick refresher for things you can do with vanilla tensor operations.