Machine Learning System Design Interview #49 - The Cross-Entropy Trap
Why optimizing for rigid classification boundaries overwrites historical data during continuous updates, and how enforcing stable latent topologies cures catastrophic forgetting at the source.
You’re in a Senior ML Engineer interview at DeepMind and the interviewer asks:
“We’re deploying a stateful neural network backbone that continuously updates on a non-stationary data stream. Our standard cross-entropy loss is massively accelerating catastrophic forgetting. What alternative representational objective do you implement to stop the bleeding, and why?”
Don’t say: “I’d use a regularization penalty like Elastic Weight Consolidation (EWC), lower the learning rate, or just freeze the early layers and retrain the classification head.”
Wrong approach. You’re treating the symptom, not the disease.
The reality is that Cross-Entropy (CE) is fundamentally brittle in a continual learning environment.
Here is why: CE forces the network to map inputs to rigid, mutually exclusive logits. When a new data distribution (or class) arrives, the network aggressively overwrites its learned weights to violently shift the decision boundaries. The result? Total representation collapse and catastrophic forgetting of historical data.
You’re essentially building rigid concrete walls to separate different types of animals in a zoo. The second a new, unexpected animal arrives, you have to demolish and rebuild the entire layout.
The production-grade fix is moving to a Representational Learning Objective, specifically something like Supervised Contrastive Learning (SupCon).
Here is exactly how this architectural shift saves your pipeline:


