AI Interview Prep

AI Interview Prep

Machine Learning System Design Interview #42 - The Base-Rate F1 Trap

Why a phenomenal 0.90 F1-score can quietly mask a completely untrained dummy model, and how to decouple aggregate metrics before they cause a silent production crash.

Hao Hoang's avatar
Hao Hoang
May 30, 2026
∙ Paid

You’re in a Senior ML Engineer interview at Meta. The interviewer sets a trap:

“An engineer shows you a binary classification model boasting a phenomenal 0.90 F1-score on a newly curated validation set, claiming it’s ready for production deployment. Before even looking at the architecture, you flag this metric as a potential illusion. What hidden data profile characteristic are you suspecting, and how do you prove it?”

95% of candidates walk right into it.

Most candidates say: “A 0.90 F1-score is highly robust against class imbalance, unlike accuracy, so the model is fundamentally solid. To be safe, I’ll just check the confusion matrix, plot the ROC-AUC curve, and tune the classification threshold.”


𝐓𝐡𝐞 𝐑𝐞𝐚𝐥𝐢𝐭𝐲:

They forgot how easily aggregate metrics mask high base-rate skews and Simpson’s Paradox. If your newly curated validation set has an underlying 90% positive class distribution, a completely brainless, untrained dummy model that randomly outputs the positive class 90% of the time will naturally achieve a 0.90 F1-score.

You aren’t looking at a production-ready model; you are looking at a baseline illusion. Relying on global metrics across a macro-level validation set completely blinds you to systemic failures inside critical data slices and minority classes, ensuring a silent crash the moment the model encounters real-world data distributions.


𝐓𝐡𝐞 𝐒𝐨𝐥𝐮𝐭𝐢𝐨𝐧:

AI Interview Prep is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

User's avatar

Continue reading this post for free, courtesy of Hao Hoang.

Or purchase a paid subscription.
© 2026 Hao Hoang · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture