Computer Vision Interview Questions #6 – The Model Capacity Trap

Why shrinking an overfitting network makes optimization harder, and why over-parameterization is the safer bet.

Jan 07, 2026

∙ Paid

You’re in a Senior MLE interview at Google DeepMind. The interviewer sets a trap:

“Our new foundation model is overfitting severely on the training set. Should we cut the hidden dimension size from 4096 to 1024 to limit its capacity?”

90% of candidates walk right into it.

The candidates say: “Yes. Overfitting means the model has too much capacity, it’s memorizing noise instead of learning patterns. We should reduce the number of parameters (neurons/layers) to force generalization.”

It feels logical as a textbook answer. It’s also the wrong architectural move.

The reality is that they aren’t optimizing for parameter efficiency, they are optimizing for the loss landscape.

When you starve a network by reducing its size, you aren’t just preventing overfitting, you are creating a harder optimization problem. Smaller networks have complex, non-convex loss landscapes filled with nasty local minima. They struggle to converge at all.

-----

𝐓𝐡𝐞 𝐒𝐨𝐥𝐮𝐭𝐢𝐨𝐧: The Senior Engineer knows the real rule of thumb: 𝘕𝘦𝘷𝘦𝘳 𝘶𝘴𝘦 𝘮𝘰𝘥𝘦𝘭 𝘴𝘪𝘻𝘦 𝘢𝘴 𝘢 𝘳𝘦𝘨𝘶𝘭𝘢𝘳𝘪𝘻𝘦𝘳.

Continue reading this post for free, courtesy of Hao Hoang.

Or purchase a paid subscription.

AI Interview Prep

Computer Vision Interview Questions #6 – The Model Capacity Trap

Why shrinking an overfitting network makes optimization harder, and why over-parameterization is the safer bet.

Continue reading this post for free, courtesy of Hao Hoang.