Advanced Reinforcement Learning Interview Questions #2 - The Mean Collapse Trap
You can spend infinite compute learning better parameters, but a Gaussian head still cannot represent forks in the action space.
Youโre in a Machine Learning interview at Tesla the interviewer asks:
โWe have an imitation learning agent that is underfitting complex human driving data. A junior engineer suggests scaling the backbone network size by 10x to ๐ช๐ฏ๐ค๐ณ๐ฆ๐ข๐ด๐ฆ ๐ค๐ข๐ฑ๐ข๐ค๐ช๐ต๐บ. We are currently using a simple Gaussian output head. Why will scaling the network fail to solve the problem, no matter how much compute you throw at it?โ
Most of candiadates say: โThe model is underfitting, so the hypothesis space is too small. Increasing the parameter count will allow the network to learn more complex features and better map states to actions.โ
They just failed the interview since they burned a million dollars in compute to get the exact same failure.
The bottleneck here isnโt ๐ ๐ฎ๐ง๐๐ญ๐ข๐จ๐ง ๐๐ฑ๐ฉ๐ซ๐๐ฌ๐ฌ๐ข๐ฏ๐ข๐ญ๐ฒ (๐ฉ๐ฐ๐ธ ๐ด๐ฎ๐ข๐ณ๐ต ๐ต๐ฉ๐ฆ ๐ฏ๐ฆ๐ถ๐ณ๐ข๐ญ ๐ฏ๐ฆ๐ต ๐ช๐ด), itโs ๐๐ข๐ฌ๐ญ๐ซ๐ข๐๐ฎ๐ญ๐ข๐จ๐ง ๐๐ฑ๐ฉ๐ซ๐๐ฌ๐ฌ๐ข๐ฏ๐ข๐ญ๐ฒ (๐ธ๐ฉ๐ข๐ต ๐ต๐ฉ๐ฆ ๐ฏ๐ฆ๐ถ๐ณ๐ข๐ญ ๐ฏ๐ฆ๐ต ๐ช๐ด ๐ข๐ญ๐ญ๐ฐ๐ธ๐ฆ๐ฅ ๐ต๐ฐ ๐ด๐ข๐บ).
If your output head is ๐ ๐ฌ๐ข๐ง๐ ๐ฅ๐ ๐๐๐ฎ๐ฌ๐ฌ๐ข๐๐ง (ฮผ, ฯ), you are mathematically forcing the model to be ๐ถ๐ฏ๐ช๐ฎ๐ฐ๐ฅ๐ข๐ญ.
Here is the reality of production robotics:
- ๐๐ฉ๐ฆ ๐๐ข๐ต๐ข: In a specific scenario, 50% of human drivers go Left. 50% go Right. This is a multimodal distribution.
- ๐๐ฉ๐ฆ ๐๐ฐ๐ฅ๐ฆ๐ญ: A Gaussian head cannot represent two peaks. It must find a single mean.
- ๐๐ฉ๐ฆ ๐๐ฆ๐ด๐ถ๐ญ๐ต: (Left + Right) / 2 = Straight.
Keep reading with a 7-day free trial
Subscribe to AI Interview Prep to keep reading this post and get 7 days of free access to the full post archives.

