Advanced NLP Interview Questions #6 - The LoRA Initialization Trap

How random initialization injects noise into frozen weights, and why zero is the safest starting point.

Dec 13, 2025

∙ Paid

You’re in a Senior ML Engineer interview at Google DeepMind. The interviewer sets a quiet trap:

“You are implementing 𝐋𝐨𝐑𝐀 (𝐋𝐨𝐰-𝐑𝐚𝐧𝐤 𝐀𝐝𝐚𝐩𝐭𝐚𝐭𝐢𝐨𝐧) from scratch. How do you initialize the down-projection matrix A and the up-projection matrix B?”

90% of candidates walk right into a brick wall.

Most candidates reflexively answer: “I’d use standard deep learning best practices. 𝐗𝐚𝐯𝐢𝐞𝐫/𝐆𝐥𝐨𝐫𝐨𝐭 or 𝐊𝐚𝐢𝐦𝐢𝐧𝐠 initialization for both matrices to ensure stable variance and healthy gradients.”

It feels like the safe, textbook answer.

In reality, you just broke the model before the first training step.

The candidate forgot what LoRA actually is. It isn’t a new layer, it is a residual update.

The effective weight formula is:

W_new = W_frozen + (B x A)

If they initialize both A and B with random weights (𝘟𝘢𝘷𝘪𝘦𝘳/𝘒𝘢𝘪𝘮𝘪𝘯𝘨), their product (B x A) becomes a matrix of random noise.

At Step 0, you are adding this random noise directly to your carefully pre-trained 70B parameter weights.

You aren’t fine-tuning, you are lobotomizing the model’s existing knowledge with a “cold start” shock.

Continue reading this post for free, courtesy of Hao Hoang.

Or purchase a paid subscription.

AI Interview Prep

Advanced NLP Interview Questions #6 - The LoRA Initialization Trap

How random initialization injects noise into frozen weights, and why zero is the safest starting point.

Continue reading this post for free, courtesy of Hao Hoang.