Tag: Deep Learning
All the articles with the tag "Deep Learning".
- Softmax to the MaxPublished: at 10:02 AM in 10 min read- A deep dive into the softmax, logsoftmax, and logsumexp functions, their gradients, and how they relate to each other. 
- Flash Attention in a FlashPublished: at 02:49 AM in 8 min read- Flash Attention is a new attention mechanism that can be used to speed up training and inference in large-scale AI models. This article provides an overview of Flash Attention and its applications in AI research and development. 
- Back to BackpropPublished: at 06:42 AM in 29 min read- A review of backpropagation, the workhorse of deep learning. 
- Estimating Transformer Model Properties: A Deep DivePublished: at 05:44 AM in 8 min read- In this post, we'll explore how to estimate the size of a Transformer model, including the number of parameters, FLOPs, peak memory footprint, and checkpoint size. 
- Neural Scaling Laws (Then and Now)Published: at 04:07 AM in 12 min read- Scaling laws have been a cornerstone of deep learning research for decades. In this post, we'll explore the history of scaling laws in deep learning, and how they've evolved over time. 
- Large-Scale Neural Network TrainingPublished: at 01:58 AM in 23 min read- Large-scale neural network training is a critical component of modern machine learning workflows. This post provides an overview of the challenges and solutions for training deep learning models at scale.