Advanced Reinforcement Learning Interview Questions #6 - The Initialization Gap Trap
A policy isn't done when it succeeds at its task, it's done when its final state is compatible with whatever comes next.
You’re in a final-round interview for a Senior AI Engineer role at NVIDIA Robotics.
The VP of Engineering draws a simple diagram on the whiteboard and sets the trap:
“We trained Policy A (Boil Water) to 99% accuracy. W…


