Advanced Deep Learning Interview Questions #2 - The Memory Fragmentation Trap
OOMs aren’t always about capacity - they’re often allocator fragmentation failures that only surface under peak allocation pressure.
You’re in a Senior ML Engineer interview at Meta and the interviewer asks:
“A junior dev hands you a 500-line PyTorch Out-of-Memory (OOM) stack trace and asks for help. What is your exact debugging workflow before you even think about telling them to ‘just lower the batch size’?”
Most candidates say: “I’d look at the bottom of the trace to find the faili…


