- CS336 - Assignment 1: Tokenizer & Transformer Basics
CS336 Assignment 1: implement a BPE tokenizer, the Transformer architecture (RMSNorm, RoPE, SwiGLU), and AdamW from scratch, then train on TinyStories.
3 min read en - CS336 - Assignment 4: Data Processing & Filtering
CS336 Assignment 4: turn raw Common Crawl into pretraining data — HTML-to-text extraction, quality and safety filtering, PII removal, and deduplication.
3 min read en - CS336 - Assignment 3: Scaling Laws
CS336 Assignment 3: fit neural scaling laws using the IsoFLOP method and a training API to predict compute-optimal model size and data.
2 min read en - CS336 - Assignment 2: Systems (Triton & Distributed)
CS336 Assignment 2: profile and benchmark the model, implement FlashAttention-2 in Triton, and build distributed data parallel with optimizer sharding.
3 min read en - CS336 - Assignment 5: Alignment & Reasoning RL
CS336 Assignment 5: align language models with supervised fine-tuning and reinforcement learning (expert iteration, GRPO) to improve math reasoning.
3 min read en - CS336 Language Modeling from Scratch - Introduction
Overview of the Stanford CS336 series: building a language model from scratch through five hands-on assignments, from tokenizer to alignment.
3 min read en - Diary #10 min read en
- P1131 [ZJOI2007] Time Synchronization - Luogu Solution
Tree DP: align leaf arrival times by only increasing edge weights. Compute the longest root-to-leaf path per subtree and pay the gap on each child branch.
9 min read en
Blog
Page 1 - Showing 8 of 26 posts
View all posts by years →