Computer Vision Interview Questions #5 – The Dead ReLU Trap
Why lowering the learning rate can't resurrect dead neurons - and how architectural gradient flow actually fixes it.
You’re in a Senior ML Interview at OpenAI. The interviewer sets a trap.
They show you a TensorBoard graph: 40% of your hidden layer neurons are outputting exactly zero. They have stopped updating entirely.
The question: “How do you fix this?”
90% of candidates walk right into the trap.
They say: “It’s a learning rate issue. I would lower the learning rate to stop the weights from jumping too far.”
This answer is technically “safe,” but it fails the production test. Why? Because lowering the learning rate is preventative, not curative. It doesn’t solve the structural failure that has already occurred.
The reality is they aren’t fighting a hyperparameter issue. They are fighting 𝐓𝐡𝐞 𝐇𝐚𝐫𝐝-𝐙𝐞𝐫𝐨 𝐋𝐨𝐜𝐤𝐨𝐮𝐭.
1️⃣ A large gradient update pushes a neuron’s weights such that w*x + b becomes negative for all inputs in your dataset.
2️⃣ Standard ReLU is max(0, x). If the input is negative, the output is 0.
3️⃣ Crucially: The gradient of 0 is 0.


