Computer Vision Interview Questions #5 – The Dead ReLU Trap

Why lowering the learning rate can't resurrect dead neurons - and how architectural gradient flow actually fixes it.

Jan 06, 2026

∙ Paid

You’re in a Senior ML Interview at OpenAI. The interviewer sets a trap.

They show you a TensorBoard graph: 40% of your hidden layer neurons are outputting exactly zero. They have stopped updating entirely.

The question: “How do you fix this?”

90% of candidates walk right into the trap.

They say: “It’s a learning rate issue. I would lower the learning rate to stop the weights from jumping too far.”

This answer is technically “safe,” but it fails the production test. Why? Because lowering the learning rate is preventative, not curative. It doesn’t solve the structural failure that has already occurred.

The reality is they aren’t fighting a hyperparameter issue. They are fighting 𝐓𝐡𝐞 𝐇𝐚𝐫𝐝-𝐙𝐞𝐫𝐨 𝐋𝐨𝐜𝐤𝐨𝐮𝐭.

1️⃣ A large gradient update pushes a neuron’s weights such that w*x + b becomes negative for all inputs in your dataset.

2️⃣ Standard ReLU is max(0, x). If the input is negative, the output is 0.

3️⃣ Crucially: The gradient of 0 is 0.

Continue reading this post for free, courtesy of Hao Hoang.

Or purchase a paid subscription.

AI Interview Prep

Computer Vision Interview Questions #5 – The Dead ReLU Trap

Why lowering the learning rate can't resurrect dead neurons - and how architectural gradient flow actually fixes it.

Continue reading this post for free, courtesy of Hao Hoang.