Generative Vision Interview Questions #4 - The SNR Collapse Trap

Why skipping continuous float scaling doesn't just cause NaN losses, it mathematically dwarfs your scheduled noise and leaves your reverse process dead on arrival.

Jun 11, 2026

You’re in a Senior AI Engineer interview at Midjourney. The interviewer sets a trap:

“Images are discrete RGB values from 0 to 255. Diffusion math assumes we sample from a standard normal distribution. If a data engineer feeds raw 0-255 pixel tensors directly into the training pipeline without continuous float scaling, how does this mathematically break the variance-preserving nature of the forward process?”

90% of candidates walk right into it.

The textbook instinct is to blame the neural network’s mechanics.

Most candidates say: “It will cause exploding gradients. The unscaled inputs will saturate the activation functions, and your loss will immediately spike to NaN.”

While that might be true for the U-Net, it completely misses the mathematical foundation of diffusion.

But you aren’t debugging a standard image classifier; you are debugging a stochastic Markov chain.

The reality is that the forward process q(xₜ|xₜ₋₁) relies on a strictly calibrated noise schedule to incrementally destroy the image. If you don’t scale the inputs to [-1, 1], you trigger what I call The 𝐒𝐍𝐑 𝐂𝐨𝐥𝐥𝐚𝐩𝐬𝐞.

Here is what is actually happening under the hood:

The Massive Variance: A uniform distribution of pixels from 0 to 255 has a variance of roughly 5,400.
The Microscopic Noise: A standard variance-preserving schedule injects noise using βₜ values that start infinitesimally small (e.g., 1e-4 at t=1).
The Math Breakdown: You are adding a tiny fraction of 𝒩(0, I) noise to a signal with massive magnitude. The injected noise becomes a literal rounding error.
The Convergence Failure: By step T=1000, your noisy image x_T is supposed to equal pure standard Gaussian noise. Unscaled, it never even gets close.

If x_T doesn’t converge to 𝒩(0, I), your reverse process is dead on arrival. You will be asking the model to denoise from a distribution it has never seen.

The Answer That Gets You Hired:

“Feeding raw 0-255 inputs breaks the boundary conditions of the Markov chain. The data variance dwarfs the scheduled noise variance, meaning x_T never converges to a standard normal distribution. The model will fail to generate anything because it’s expecting to start denoising from pure 𝒩(0, I), but your forward process never actually reached it.”

📚 Related Papers:

- Denoising Diffusion Probabilistic Models (DDPM) . Available at: https://arxiv.org/abs/2006.11239

- Improved Denoising Diffusion Probabilistic Models. Available at: https://arxiv.org/abs/2102.09672

- Score-Based Generative Modeling through Stochastic Differential Equations. https://arxiv.org/abs/2011.13456

- Variational Diffusion Models. Available at: https://arxiv.org/abs/2107.00630

AI Interview Prep

Discussion about this post

Ready for more?