Advanced Reinforcement Learning Interview Questions #16 - The Bootstrapping Bias Trap
Bootstrapping doesn’t just reduce variance, it injects your model’s current errors directly into the label, turning bad initialization into self-reinforcing policy collapse.
You’re in a Senior RL Engineer interview at OpenAI and the interviewer drops this scenario:
“We accidentally initialized our Value Network to output -1000 for every state. We run one update step using Monte Carlo and one using Bootstrapping (TD-Learning). Which algorithm breaks immediately, and which one survives?”


