Posts
All the articles I've posted.
Back to the Basics
Published: at 09:32 AM in 1 min readA refresher on the fundamentals of computer science.
Softmax to the Max
Published: at 10:02 AM in 10 min readA deep dive into the softmax, logsoftmax, and logsumexp functions, their gradients, and how they relate to each other.
Flash Attention in a Flash
Published: at 02:49 AM in 8 min readFlash Attention is a new attention mechanism that can be used to speed up training and inference in large-scale AI models. This article provides an overview of Flash Attention and its applications in AI research and development.
o3 and the Future of Science
Published: at 10:04 PM in 23 min reado3 and Human-AI scientific collaboration
GPU Puzzles
Published: at 05:32 AM in 10 min readMy solutions to srush's GPU Puzzles
Computing the Jacobian of a Matrix Product
Published: at 03:31 AM in 6 min readA step-by-step illustration of how you can visualize and derive the Jacobian of a matrix product, using a concrete example.