LLM System Design Interview #24 - Why Backprop Is 3× Harder Than You Think
Why intern engineers underestimate training FLOPs by 300% - and how the dual gradient calculations in backprop make the backward pass twice as expensive as the forward pass.
You’re in a Machine Learning Systems interview at Google DeepMind and the interviewer asks:
“You’re asked to budget a training run. An intern engineer estimates the total FLOPs as 2 * num_params * num_tokens, arguing the backward pass is roughly symmetrical to the forward pass. Why is this cost estimate off by 300%, and what two distinct gradient calcula…


