Advanced Reinforcement Learning Interview Questions #16 - The Bootstrapping Bias Trap

Bootstrapping doesn’t just reduce variance, it injects your model’s current errors directly into the label, turning bad initialization into self-reinforcing policy collapse.

Feb 11, 2026

∙ Paid

You’re in a Senior RL Engineer interview at OpenAI and the interviewer drops this scenario:

“We accidentally initialized our Value Network to output -1000 for every state. We run one update step using Monte Carlo and one using Bootstrapping (TD-Learning). Which algorithm breaks immediately, and which one survives?”

Continue reading this post for free, courtesy of Hao Hoang.

Or purchase a paid subscription.

AI Interview Prep

Advanced Reinforcement Learning Interview Questions #16 - The Bootstrapping Bias Trap

Bootstrapping doesn’t just reduce variance, it injects your model’s current errors directly into the label, turning bad initialization into self-reinforcing policy collapse.

Continue reading this post for free, courtesy of Hao Hoang.