Archives

All the articles I've archived.

2025 ⁶

April ¹

Swirling Thoughts on AI, LLMs, Novel Hypothesis Generation and Crypto?
Published:Apr 3, 2025 at 09:45 PM in 14 min read
In this post, I swirl my own thoughts on the very topic of swirling thoughts in the context of AI, LLMs and their potential for novel hypothesis generation. I ground this discussion in some of the other ongoing parallel lines of thoughts/investigation including GRPO, reinforcement learning, the future of science and scientific peer review and the notion of being able to "purchase" scientific innovation.

March ²

Bridging the Knowledge Gap in Multimodal AI with DeepResearch
Published:Mar 2, 2025 at 04:27 AM in 19 min read
How OpenAI's DeepResearch reveals what industry labs aren't sharing about next-gen AI assistants
Multistep Reasoning Agents (with GRPO & RLEF) - Project Euler Edition
Published:Mar 23, 2025 at 02:38 AM in 21 min read
Converting chat LLMs into reasoning agents through GRPO and reinforcement learning from execution feedback — achieving performance improvements on Project Euler's algorithmic challenges with multi-step reasoning, tool use, and code execution.

January ³

Python and Pandas and P-values, Oh My! A Statistical Journey
Published:Jan 14, 2025 at 01:15 AM in 26 min read
A rapid refresher on the basics of statistics and how to apply them in Python.
A Straight Line Through Linear Algebra
Published:Jan 2, 2025 at 07:27 PM in 9 min read
A brief review of core concepts in linear algebra, with a focus on applications in machine learning.
Pandas Primer
Published:Jan 15, 2025 at 01:15 AM in 5 min read
A rapid review of the Pandas library for the impatient data scientist.

2024 ¹⁸

December ¹²

Feeding the Beast - Data Loading Secrets for Hungry Neural Networks
Published:Dec 3, 2024 at 02:53 AM in 11 min read
Data loading is a critical part of training deep learning models. In this post, we'll explore the best practices for loading data into neural networks, with a focus on PyTorch.
Flash Attention in a Flash
Published:Dec 27, 2024 at 02:49 AM in 8 min read
Flash Attention is a new attention mechanism that can be used to speed up training and inference in large-scale AI models. This article provides an overview of Flash Attention and its applications in AI research and development.
Estimating Transformer Model Properties: A Deep Dive
Published:Dec 7, 2024 at 05:44 AM in 8 min read
In this post, we'll explore how to estimate the size of a Transformer model, including the number of parameters, FLOPs, peak memory footprint, and checkpoint size.
AutoDiff Puzzles
Published:Dec 20, 2024 at 05:32 AM in 19 min read
My solutions to srush's AutoDiff Puzzles. This is useful as a quick refresher for computing gradients.
Back to Backprop
Published:Dec 12, 2024 at 06:42 AM in 29 min read
A review of backpropagation, the workhorse of deep learning.
Back to the Basics
Published:Dec 31, 2024 at 09:32 AM in 1 min read
A refresher on the fundamentals of computer science.
GPU Puzzles
Published:Dec 23, 2024 at 05:32 AM in 10 min read
My solutions to srush's GPU Puzzles
Large-Scale Neural Network Training
Published:Dec 2, 2024 at 01:58 AM in 23 min read
Large-scale neural network training is a critical component of modern machine learning workflows. This post provides an overview of the challenges and solutions for training deep learning models at scale.
Computing the Jacobian of a Matrix Product
Published:Dec 23, 2024 at 03:31 AM in 6 min read
A step-by-step illustration of how you can visualize and derive the Jacobian of a matrix product, using a concrete example.
o3 and the Future of Science
Published:Dec 23, 2024 at 10:04 PM in 23 min read
o3 and Human-AI scientific collaboration
Neural Scaling Laws (Then and Now)
Published:Dec 3, 2024 at 04:07 AM in 12 min read
Scaling laws have been a cornerstone of deep learning research for decades. In this post, we'll explore the history of scaling laws in deep learning, and how they've evolved over time.
Softmax to the Max
Published:Dec 30, 2024 at 10:02 AM in 10 min read
A deep dive into the softmax, logsoftmax, and logsumexp functions, their gradients, and how they relate to each other.

November ³

Can AI Write about AI Personality With Personality?
Published:Nov 23, 2024 at 07:33 AM in 31 min read
Should AI have "Personality" or will "Bland and Mid" suffice?
All you need (to know) about attention
Published:Nov 4, 2024 at 02:35 AM in 35 min read
A highly compressed guide to quickly refresh (mostly) all you need to know about attention mechanisms in deep learning.
o1 and Reasoning
Published:Nov 20, 2024 at 12:53 AM in 62 min read
Organizing research presumably related to OpenAI's o1 reasoning model.

October ³

Byte Pair Encoding Tokenizer
Published:Oct 28, 2024 at 01:37 AM in 6 min read
An in-depth exploration of the Byte Pair Encoding (BPE) tokenizer, its mechanics, and its applications in natural language processing.
Automating App Deployment on a VPS with GitHub Actions
Published:Oct 24, 2024 at 03:47 AM in 8 min read
Learn how to automate your VPS application deployment process using GitHub Actions for a seamless CI/CD experience.
Tensor Puzzles
Published:Oct 20, 2024 at 05:32 AM in 13 min read
My solutions to srush’s tensor puzzles. This is useful as a quick refresher for things you can do with vanilla tensor operations.

Archives

Swirling Thoughts on AI, LLMs, Novel Hypothesis Generation and Crypto?

Bridging the Knowledge Gap in Multimodal AI with DeepResearch

Multistep Reasoning Agents (with GRPO & RLEF) - Project Euler Edition

Python and Pandas and P-values, Oh My! A Statistical Journey

A Straight Line Through Linear Algebra

Pandas Primer

Feeding the Beast - Data Loading Secrets for Hungry Neural Networks

Flash Attention in a Flash

Estimating Transformer Model Properties: A Deep Dive

AutoDiff Puzzles

Back to Backprop

Back to the Basics

GPU Puzzles

Large-Scale Neural Network Training

Computing the Jacobian of a Matrix Product

o3 and the Future of Science

Neural Scaling Laws (Then and Now)

Softmax to the Max

Can AI Write about AI Personality With Personality?

All you need (to know) about attention

o1 and Reasoning

Byte Pair Encoding Tokenizer

Automating App Deployment on a VPS with GitHub Actions

Tensor Puzzles