Machine Learning System Design Interview #35 - The Weighted Cross-Entropy Trap
Why scaling loss by class frequency silently swamps your gradients with easy background noise, and how to dynamically shift optimization focus to hard production edge cases.
You’re in a Senior ML Engineer interview at Meta. The interviewer sets a trap:
“You’re training a fraud detection model on an extremely imbalanced production stream, 1 fraud sample for every 10,000 legitimate transactions. How do you construct the loss function to ensure the model actually learns the rare class without collapsing?”
95% of candidates walk…


