AI Interview Prep

AI Interview Prep

Advanced Reinforcement Learning Interview Questions #2 - The Mean Collapse Trap

You can spend infinite compute learning better parameters, but a Gaussian head still cannot represent forks in the action space.

Hao Hoang's avatar
Hao Hoang
Jan 28, 2026
โˆ™ Paid

Youโ€™re in a Machine Learning interview at Tesla the interviewer asks:

โ€œWe have an imitation learning agent that is underfitting complex human driving data. A junior engineer suggests scaling the backbone network size by 10x to ๐˜ช๐˜ฏ๐˜ค๐˜ณ๐˜ฆ๐˜ข๐˜ด๐˜ฆ ๐˜ค๐˜ข๐˜ฑ๐˜ข๐˜ค๐˜ช๐˜ต๐˜บ. We are currently using a simple Gaussian output head. Why will scaling the network fail to solve the problem, no matter how much compute you throw at it?โ€

Most of candiadates say: โ€œThe model is underfitting, so the hypothesis space is too small. Increasing the parameter count will allow the network to learn more complex features and better map states to actions.โ€

They just failed the interview since they burned a million dollars in compute to get the exact same failure.

The bottleneck here isnโ€™t ๐…๐ฎ๐ง๐œ๐ญ๐ข๐จ๐ง ๐„๐ฑ๐ฉ๐ซ๐ž๐ฌ๐ฌ๐ข๐ฏ๐ข๐ญ๐ฒ (๐˜ฉ๐˜ฐ๐˜ธ ๐˜ด๐˜ฎ๐˜ข๐˜ณ๐˜ต ๐˜ต๐˜ฉ๐˜ฆ ๐˜ฏ๐˜ฆ๐˜ถ๐˜ณ๐˜ข๐˜ญ ๐˜ฏ๐˜ฆ๐˜ต ๐˜ช๐˜ด), itโ€™s ๐ƒ๐ข๐ฌ๐ญ๐ซ๐ข๐›๐ฎ๐ญ๐ข๐จ๐ง ๐„๐ฑ๐ฉ๐ซ๐ž๐ฌ๐ฌ๐ข๐ฏ๐ข๐ญ๐ฒ (๐˜ธ๐˜ฉ๐˜ข๐˜ต ๐˜ต๐˜ฉ๐˜ฆ ๐˜ฏ๐˜ฆ๐˜ถ๐˜ณ๐˜ข๐˜ญ ๐˜ฏ๐˜ฆ๐˜ต ๐˜ช๐˜ด ๐˜ข๐˜ญ๐˜ญ๐˜ฐ๐˜ธ๐˜ฆ๐˜ฅ ๐˜ต๐˜ฐ ๐˜ด๐˜ข๐˜บ).

If your output head is ๐š ๐ฌ๐ข๐ง๐ ๐ฅ๐ž ๐†๐š๐ฎ๐ฌ๐ฌ๐ข๐š๐ง (ฮผ, ฯƒ), you are mathematically forcing the model to be ๐˜ถ๐˜ฏ๐˜ช๐˜ฎ๐˜ฐ๐˜ฅ๐˜ข๐˜ญ.

Here is the reality of production robotics:

- ๐˜›๐˜ฉ๐˜ฆ ๐˜‹๐˜ข๐˜ต๐˜ข: In a specific scenario, 50% of human drivers go Left. 50% go Right. This is a multimodal distribution.

- ๐˜›๐˜ฉ๐˜ฆ ๐˜”๐˜ฐ๐˜ฅ๐˜ฆ๐˜ญ: A Gaussian head cannot represent two peaks. It must find a single mean.

- ๐˜›๐˜ฉ๐˜ฆ ๐˜™๐˜ฆ๐˜ด๐˜ถ๐˜ญ๐˜ต: (Left + Right) / 2 = Straight.

AI Interview Prep is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Keep reading with a 7-day free trial

Subscribe to AI Interview Prep to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
ยฉ 2026 Hao Hoang ยท Privacy โˆ™ Terms โˆ™ Collection notice
Start your SubstackGet the app
Substack is the home for great culture