AI Interview Prep

AI Interview Prep

LLM System Design Interview #18 - The Throughput–Latency Paradox

The hidden hump where bigger batches stop saving money and start destroying user experience.

Hao Hoang's avatar
Hao Hoang
Nov 15, 2025
∙ Paid

You’re in a Senior ML Engineer interview at Anthropic and the interviewer asks:

“Our ops team wants to 8x our batch size to cut costs and improve throughput. Why is this a dangerous move for user experience, and at what point does this strategy stop making sense... before you run out of memory?”

The common answer: “It will increase latency because the bat…

User's avatar

Continue reading this post for free, courtesy of Hao Hoang.

Or purchase a paid subscription.
© 2026 Hao Hoang · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture