LLM System Design Interview #18 - The Throughput–Latency Paradox

The hidden hump where bigger batches stop saving money and start destroying user experience.

Nov 15, 2025

∙ Paid

You’re in a Senior ML Engineer interview at Anthropic and the interviewer asks:

“Our ops team wants to 8x our batch size to cut costs and improve throughput. Why is this a dangerous move for user experience, and at what point does this strategy stop making sense... before you run out of memory?”

The common answer: “It will increase latency because the bat…

Continue reading this post for free, courtesy of Hao Hoang.

Or purchase a paid subscription.

AI Interview Prep

LLM System Design Interview #18 - The Throughput–Latency Paradox

The hidden hump where bigger batches stop saving money and start destroying user experience.

Continue reading this post for free, courtesy of Hao Hoang.