Advanced NLP Interview Questions #6 - The LoRA Initialization Trap
How random initialization injects noise into frozen weights, and why zero is the safest starting point.
Youโre in a Senior ML Engineer interview at Google DeepMind. The interviewer sets a quiet trap:
โYou are implementing ๐๐จ๐๐ (๐๐จ๐ฐ-๐๐๐ง๐ค ๐๐๐๐ฉ๐ญ๐๐ญ๐ข๐จ๐ง) from scratch. How do you initialize the down-projection matrix A and the up-projection matrix B?โ
90% of candidates walk right into a brick wall.
Most candidates reflexively answer: โIโd use standard deep learning best practices. ๐๐๐ฏ๐ข๐๐ซ/๐๐ฅ๐จ๐ซ๐จ๐ญ or ๐๐๐ข๐ฆ๐ข๐ง๐ initialization for both matrices to ensure stable variance and healthy gradients.โ
It feels like the safe, textbook answer.
In reality, you just broke the model before the first training step.
The candidate forgot what LoRA actually is. It isnโt a new layer, it is a residual update.
The effective weight formula is:
W_new = W_frozen + (B x A)
If they initialize both A and B with random weights (๐๐ข๐ท๐ช๐ฆ๐ณ/๐๐ข๐ช๐ฎ๐ช๐ฏ๐จ), their product (B x A) becomes a matrix of random noise.
At Step 0, you are adding this random noise directly to your carefully pre-trained 70B parameter weights.
You arenโt fine-tuning, you are lobotomizing the modelโs existing knowledge with a โcold startโ shock.


