Blog Series About

Back

Blog

Page 1 - Showing 8 of 26 posts View all posts by years →

June 5, 2026

CS336 - Assignment 1: Tokenizer & Transformer Basics

CS336 Assignment 1: implement a BPE tokenizer, the Transformer architecture (RMSNorm, RoPE, SwiGLU), and AdamW from scratch, then train on TinyStories.

3 min read en
June 5, 2026

CS336 - Assignment 4: Data Processing & Filtering

CS336 Assignment 4: turn raw Common Crawl into pretraining data — HTML-to-text extraction, quality and safety filtering, PII removal, and deduplication.

3 min read en
- language models
- cs336
- data
- notes
June 5, 2026

CS336 - Assignment 3: Scaling Laws

CS336 Assignment 3: fit neural scaling laws using the IsoFLOP method and a training API to predict compute-optimal model size and data.

2 min read en
June 5, 2026

CS336 - Assignment 2: Systems (Triton & Distributed)

CS336 Assignment 2: profile and benchmark the model, implement FlashAttention-2 in Triton, and build distributed data parallel with optimizer sharding.

3 min read en
June 5, 2026

CS336 - Assignment 5: Alignment & Reasoning RL

CS336 Assignment 5: align language models with supervised fine-tuning and reinforcement learning (expert iteration, GRPO) to improve math reasoning.

3 min read en
June 5, 2026

CS336 Language Modeling from Scratch - Introduction

Overview of the Stanford CS336 series: building a language model from scratch through five hands-on assignments, from tokenizer to alignment.

3 min read en
May 21, 2026

Diary #1

0 min read en
- diary
May 6, 2026

P1131 [ZJOI2007] Time Synchronization - Luogu Solution

Tree DP: align leaf arrival times by only increasing edge weights. Compute the longest root-to-leaf path per subtree and pay the gap on each child branch.

9 min read en
- algorithm
- luogu
- tree dp
- tree