Machine Learning System Design Interview #18 - The Semantic Imbalance Trap
Why rotating the same 45 deer won’t save your classifier and how generative synthesis actually fixes class imbalance.
You’re in a Senior ML Interview at OpenAI. The interviewer sets a trap:
“We have 50 000 images of ‘city streets’ but only 45 images of ‘deer at night.’ How do we fix this 𝐂𝐥𝐚𝐬𝐬 𝐈𝐦𝐛𝐚𝐥𝐚𝐧𝐜𝐞 to prevent the model from ignoring the deer?”
90% of candidates walk right into the trap.
They say “I will ramp up the data augmentation pipeline.” Then they start listing standard 𝘵𝘰𝘳𝘤𝘩𝘷𝘪𝘴𝘪𝘰𝘯 transforms: 𝘙𝘢𝘯𝘥𝘰𝘮𝘏𝘰𝘳𝘪𝘻𝘰𝘯𝘵𝘢𝘭𝘍𝘭𝘪𝘱, 𝘙𝘢𝘯𝘥𝘰𝘮𝘙𝘰𝘵𝘢𝘵𝘪𝘰𝘯(30), 𝘊𝘰𝘭𝘰𝘳𝘑𝘪𝘵𝘵𝘦𝘳, and maybe 𝘔𝘰𝘴𝘢𝘪𝘤 𝘢𝘶𝘨𝘮𝘦𝘯𝘵𝘢𝘵𝘪𝘰𝘯.
It feels like the correct, robust MLOps answer.
The interviewer nods, notes “You just trained the model to recognize those same 45 deer upside-down and slightly greener.” and moves on.


