Advanced Reinforcement Learning Interview Questions #2 - The Mean Collapse Trap

You can spend infinite compute learning better parameters, but a Gaussian head still cannot represent forks in the action space.

Jan 28, 2026

∙ Paid

You’re in a Machine Learning interview at Tesla the interviewer asks:

“We have an imitation learning agent that is underfitting complex human driving data. A junior engineer suggests scaling the backbone network size by 10x to 𝘪𝘯𝘤𝘳𝘦𝘢𝘴𝘦 𝘤𝘢𝘱𝘢𝘤𝘪𝘵𝘺. We are currently using a simple Gaussian output head. Why will scaling the network fail to solve the problem, no matter how much compute you throw at it?”

Most of candiadates say: “The model is underfitting, so the hypothesis space is too small. Increasing the parameter count will allow the network to learn more complex features and better map states to actions.”

They just failed the interview since they burned a million dollars in compute to get the exact same failure.

The bottleneck here isn’t 𝐅𝐮𝐧𝐜𝐭𝐢𝐨𝐧 𝐄𝐱𝐩𝐫𝐞𝐬𝐬𝐢𝐯𝐢𝐭𝐲 (𝘩𝘰𝘸 𝘴𝘮𝘢𝘳𝘵 𝘵𝘩𝘦 𝘯𝘦𝘶𝘳𝘢𝘭 𝘯𝘦𝘵 𝘪𝘴), it’s 𝐃𝐢𝐬𝐭𝐫𝐢𝐛𝐮𝐭𝐢𝐨𝐧 𝐄𝐱𝐩𝐫𝐞𝐬𝐬𝐢𝐯𝐢𝐭𝐲 (𝘸𝘩𝘢𝘵 𝘵𝘩𝘦 𝘯𝘦𝘶𝘳𝘢𝘭 𝘯𝘦𝘵 𝘪𝘴 𝘢𝘭𝘭𝘰𝘸𝘦𝘥 𝘵𝘰 𝘴𝘢𝘺).

If your output head is 𝐚 𝐬𝐢𝐧𝐠𝐥𝐞 𝐆𝐚𝐮𝐬𝐬𝐢𝐚𝐧 (μ, σ), you are mathematically forcing the model to be 𝘶𝘯𝘪𝘮𝘰𝘥𝘢𝘭.

Here is the reality of production robotics:

- 𝘛𝘩𝘦 𝘋𝘢𝘵𝘢: In a specific scenario, 50% of human drivers go Left. 50% go Right. This is a multimodal distribution.

- 𝘛𝘩𝘦 𝘔𝘰𝘥𝘦𝘭: A Gaussian head cannot represent two peaks. It must find a single mean.

- 𝘛𝘩𝘦 𝘙𝘦𝘴𝘶𝘭𝘵: (Left + Right) / 2 = Straight.

Keep reading with a 7-day free trial

Subscribe to AI Interview Prep to keep reading this post and get 7 days of free access to the full post archives.