LLM System Design Interview #18 - The Throughput–Latency Paradox
The hidden hump where bigger batches stop saving money and start destroying user experience.
You’re in a Senior ML Engineer interview at Anthropic and the interviewer asks:
“Our ops team wants to 8x our batch size to cut costs and improve throughput. Why is this a dangerous move for user experience, and at what point does this strategy stop making sense... before you run out of memory?”
The common answer: “It will increase latency because the bat…


