AI Interview Prep

AI Interview Prep

Advanced NLP Interview Questions #6 - The LoRA Initialization Trap

How random initialization injects noise into frozen weights, and why zero is the safest starting point.

Hao Hoang's avatar
Hao Hoang
Dec 13, 2025
โˆ™ Paid

Youโ€™re in a Senior ML Engineer interview at Google DeepMind. The interviewer sets a quiet trap:

โ€œYou are implementing ๐‹๐จ๐‘๐€ (๐‹๐จ๐ฐ-๐‘๐š๐ง๐ค ๐€๐๐š๐ฉ๐ญ๐š๐ญ๐ข๐จ๐ง) from scratch. How do you initialize the down-projection matrix A and the up-projection matrix B?โ€

90% of candidates walk right into a brick wall.

Most candidates reflexively answer: โ€œIโ€™d use standard deep learning best practices. ๐—๐š๐ฏ๐ข๐ž๐ซ/๐†๐ฅ๐จ๐ซ๐จ๐ญ or ๐Š๐š๐ข๐ฆ๐ข๐ง๐  initialization for both matrices to ensure stable variance and healthy gradients.โ€

It feels like the safe, textbook answer.

In reality, you just broke the model before the first training step.

The candidate forgot what LoRA actually is. It isnโ€™t a new layer, it is a residual update.

The effective weight formula is:

W_new = W_frozen + (B x A)

If they initialize both A and B with random weights (๐˜Ÿ๐˜ข๐˜ท๐˜ช๐˜ฆ๐˜ณ/๐˜’๐˜ข๐˜ช๐˜ฎ๐˜ช๐˜ฏ๐˜จ), their product (B x A) becomes a matrix of random noise.

At Step 0, you are adding this random noise directly to your carefully pre-trained 70B parameter weights.

You arenโ€™t fine-tuning, you are lobotomizing the modelโ€™s existing knowledge with a โ€œcold startโ€ shock.

AI Interview Prep is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

User's avatar

Continue reading this post for free, courtesy of Hao Hoang.

Or purchase a paid subscription.
ยฉ 2026 Hao Hoang ยท Privacy โˆ™ Terms โˆ™ Collection notice
Start your SubstackGet the app
Substack is the home for great culture