Archives
All the articles I've archived.
What Work Feels Like in 2026
Published: at 05:00 PM in 10 min readA personal reflection on how AI has fundamentally changed the way I approach work, ideas, and ambition.
Swirling Thoughts on AI, LLMs, Novel Hypothesis Generation and Crypto?
Published: at 09:45 PM in 14 min readIn this post, I swirl my own thoughts on the very topic of swirling thoughts in the context of AI, LLMs and their potential for novel hypothesis generation. I ground this discussion in some of the other ongoing parallel lines of thoughts/investigation including GRPO, reinforcement learning, the future of science and scientific peer review and the notion of being able to "purchase" scientific innovation.
Bridging the Knowledge Gap in Multimodal AI with DeepResearch
Published: at 04:27 AM in 19 min readHow OpenAI's DeepResearch reveals what industry labs aren't sharing about next-gen AI assistants
Multistep Reasoning Agents (with GRPO & RLEF) - Project Euler Edition
Published: at 02:38 AM in 21 min readConverting chat LLMs into reasoning agents through GRPO and reinforcement learning from execution feedback — achieving performance improvements on Project Euler's algorithmic challenges with multi-step reasoning, tool use, and code execution.
A Straight Line Through Linear Algebra
Published: at 07:27 PM in 9 min readA brief review of core concepts in linear algebra, with a focus on applications in machine learning.
Python and Pandas and P-values, Oh My! A Statistical Journey
Published: at 01:15 AM in 26 min readA rapid refresher on the basics of statistics and how to apply them in Python.
Pandas Primer
Published: at 01:15 AM in 5 min readA rapid review of the Pandas library for the impatient data scientist.
Feeding the Beast - Data Loading Secrets for Hungry Neural Networks
Published: at 02:53 AM in 11 min readData loading is a critical part of training deep learning models. In this post, we'll explore the best practices for loading data into neural networks, with a focus on PyTorch.
Estimating Transformer Model Properties: A Deep Dive
Published: at 05:44 AM in 8 min readIn this post, we'll explore how to estimate the size of a Transformer model, including the number of parameters, FLOPs, peak memory footprint, and checkpoint size.
Large-Scale Neural Network Training
Published: at 01:58 AM in 23 min readLarge-scale neural network training is a critical component of modern machine learning workflows. This post provides an overview of the challenges and solutions for training deep learning models at scale.
Neural Scaling Laws (Then and Now)
Published: at 04:07 AM in 12 min readScaling laws have been a cornerstone of deep learning research for decades. In this post, we'll explore the history of scaling laws in deep learning, and how they've evolved over time.
Softmax to the Max
Published: at 10:02 AM in 10 min readA deep dive into the softmax, logsoftmax, and logsumexp functions, their gradients, and how they relate to each other.
Computing the Jacobian of a Matrix Product
Published: at 03:31 AM in 6 min readA step-by-step illustration of how you can visualize and derive the Jacobian of a matrix product, using a concrete example.
Flash Attention in a Flash
Published: at 02:49 AM in 8 min readFlash Attention is a new attention mechanism that can be used to speed up training and inference in large-scale AI models. This article provides an overview of Flash Attention and its applications in AI research and development.
AutoDiff Puzzles
Published: at 05:32 AM in 19 min readMy solutions to srush's AutoDiff Puzzles. This is useful as a quick refresher for computing gradients.
Back to Backprop
Published: at 06:42 AM in 29 min readA review of backpropagation, the workhorse of deep learning.
Back to the Basics
Published: at 09:32 AM in 1 min readA refresher on the fundamentals of computer science.
GPU Puzzles
Published: at 05:32 AM in 10 min readMy solutions to srush's GPU Puzzles
o3 and the Future of Science
Published: at 10:04 PM in 23 min reado3 and Human-AI scientific collaboration
Can AI Write about AI Personality With Personality?
Published: at 07:33 AM in 30 min readShould AI have "Personality" or will "Bland and Mid" suffice?
All you need (to know) about attention
Published: at 02:35 AM in 35 min readA highly compressed guide to quickly refresh (mostly) all you need to know about attention mechanisms in deep learning.
o1 and Reasoning
Published: at 12:53 AM in 62 min readOrganizing research presumably related to OpenAI's o1 reasoning model.
Tensor Puzzles
Published: at 05:32 AM in 13 min readMy solutions to srush’s tensor puzzles. This is useful as a quick refresher for things you can do with vanilla tensor operations.
Byte Pair Encoding Tokenizer
Published: at 01:37 AM in 6 min readAn in-depth exploration of the Byte Pair Encoding (BPE) tokenizer, its mechanics, and its applications in natural language processing.
Automating App Deployment on a VPS with GitHub Actions
Published: at 03:47 AM in 8 min readLearn how to automate your VPS application deployment process using GitHub Actions for a seamless CI/CD experience.