Computer Vision Interview Questions #13 – The Generalization Gap Trap
Why disabling data augmentation during evaluation is the only way to measure real generalization.
You’re in a Senior Computer Vision interview at Google DeepMind. The lead engineer sets a trap:
“We use heavy data augmentation (Color Jitter, 30° Rotations) during training to improve robustness. Why do we strictly disable these during validation? Doesn’t that break the rule that 𝘛𝘳𝘢𝘪𝘯 𝘢𝘯𝘥 𝘛𝘦𝘴𝘵 𝘥𝘪𝘴𝘵𝘳𝘪𝘣𝘶𝘵𝘪𝘰𝘯𝘴 𝘴𝘩𝘰𝘶𝘭𝘥 𝘮𝘢𝘵𝘤𝘩?”
90% of candidates hesitate. They sense the trap.
The candidates say: “We disable them because real users won’t upload jittered or rotated images. We want to test on real data.”
The interviewer nods politely, writes “No” in their notes, and moves on.
Why? Because the candidates answered “What” we do, but they missed the “Why”, and specifically, they failed to address the massive mathematical distribution shift ( 𝐏_𝐭𝐫𝐚𝐢𝐧 ≠ 𝐏_𝐯𝐚𝐥 ) that they just introduced.
They aren’t just “𝘵𝘶𝘳𝘯𝘪𝘯𝘨 𝘰𝘧𝘧 𝘯𝘰𝘪𝘴𝘦”. They are managing 𝐓𝐡𝐞 𝐈𝐧𝐯𝐚𝐫𝐢𝐚𝐧𝐜𝐞.
In Deep Learning, 𝘛𝘳𝘢𝘪𝘯𝘪𝘯𝘨 𝘢𝘯𝘥 𝘐𝘯𝘧𝘦𝘳𝘦𝘯𝘤𝘦 have two fundamentally different mathematical goals regarding the “𝘚𝘦𝘮𝘢𝘯𝘵𝘪𝘤 𝘎𝘢𝘱”.
1️⃣ 𝘛𝘳𝘢𝘪𝘯𝘪𝘯𝘨 𝘪𝘴 𝘧𝘰𝘳 𝘍𝘰𝘳𝘤𝘪𝘯𝘨 𝘐𝘯𝘷𝘢𝘳𝘪𝘢𝘯𝘤𝘦:
When you apply Color Jitter, you aren’t trying to show the model “more data.” You are penalizing the model for relying on color. You are forcing the loss function to be invariant to specific transformations. You are artificially widening the input distribution to teach the model what doesn’t matter.
Keep reading with a 7-day free trial
Subscribe to AI Interview Prep to keep reading this post and get 7 days of free access to the full post archives.

