Back
CS336 Assignment 1: implement a BPE tokenizer, the Transformer architecture (RMSNorm, RoPE, SwiGLU), and AdamW from scratch, then train on TinyStories.
language models
cs336
transformer
tokenization
notes