Machine Learning System Design Interview #17 - The Data Leakage Trap
The train-fit/test-transform discipline every ML engineer must know before touching a real pipeline.
You’re in a Senior Machine Learning Interview at Google DeepMind. The interviewer sets a trap. They hand you a dataset with 15% missing values in the “Age” column and ask a simple question:
“How do you handle these missing values before we start training?”
90% of candidates walk right into the trap.
The candidate immediately grabs the whiteboard marker.
“Easy. I’ll calculate the median of the ‘Age’ column to handle outliers, then fill the empty cells with that value.”
They might even write the Pandas equivalent:
𝘥𝘧[’𝘢𝘨𝘦’] = 𝘥𝘧[’𝘢𝘨𝘦’].𝘧𝘪𝘭𝘭𝘯𝘢(𝘥𝘧[’𝘢𝘨𝘦’].𝘮𝘦𝘥𝘪𝘢𝘯())
The interviewer nods, smiles, and ends the interview 5 minutes later. They didn’t get the job.


