Machine Learning System Design Interview #47 - The EWC Rigidity Trap

The hidden failure mode where retaining old accuracy quietly fires your network like clay in a kiln, leaving no room to sculpt anything new.

Jun 04, 2026

You’re in a Senior ML Engineer interview at DeepMind and the interviewer asks:

“You deploy Elastic Weight Consolidation (EWC) to fix catastrophic forgetting during continual fine-tuning. The model successfully retains its historical accuracy, but its adaptation to the new domain completely stalls. Why?”

Most candidates say: “The model is underfitting the new data. We just need to increase the learning rate or run it for a few more epochs.”

Wrong approach.

The reality is: you’ve slammed face-first into the Stability-Plasticity Dilemma.

EWC works by computing the Fisher Information Matrix to identify which specific neural weights were mathematically crucial for the old data, heavily penalizing any updates to them.

But when adaptation stalls, your network has become overly rigid. It’s like trying to sculpt a new statue out of clay that has already been fired in a kiln.

Here is exactly what is happening under the hood:

Capacity Lockout: The specific weights that are critical for your old domain are often the exact base feature extractors needed to map the new domain. If they are locked, learning halts.
Gradient Domination: Your EWC regularization multiplier (λ) is too aggressive. The penalty gradients for moving away from the old weights are completely swallowing the loss gradients of your new data.

Seniors know that blindly turning down the λ` penalty just invites catastrophic forgetting back into the mix. You need a structural fix.

How to actually balance the trade-off:

Instead of fighting the old weights, you decouple your plasticity. You freeze the critical EWC-protected backbone and inject fresh, isolated capacity for the new domain, typically using Low-Rank Adapters (LoRA) or combining a dialed-down EWC with a sparse Experience Replay buffer.

The answer that gets you hired: “EWC is aggressively prioritizing stability over plasticity, causing the Fisher penalty to dominate the loss landscape. To unblock adaptation without triggering catastrophic forgetting, we need to dynamically decay the EWC penalty or inject isolated representational capacity via adapter modules specifically for the new domain.”

#MachineLearning #MLOps #ContinualLearning #DeepLearning #AI #DataScience #TechInterviews

📚 Related Papers:

- Overcoming catastrophic forgetting in neural networks. Available at: https://arxiv.org/abs/1612.00796

- Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence. Available at: https://arxiv.org/abs/1801.10112

- Lifelong Learning with Task-Specific Adaptation: Addressing the Stability-Plasticity Dilemma. https://arxiv.org/abs/2503.06213

- On Quadratic Penalties in Elastic Weight Consolidation. Available at: https://arxiv.org/abs/1712.03847

AI Interview Prep

Discussion about this post

Ready for more?