AI Interview Prep

AI Interview Prep

Machine Learning System Design Interview #17 - The Data Leakage Trap

The train-fit/test-transform discipline every ML engineer must know before touching a real pipeline.

Hao Hoang's avatar
Hao Hoang
Dec 03, 2025
∙ Paid

You’re in a Senior Machine Learning Interview at Google DeepMind. The interviewer sets a trap. They hand you a dataset with 15% missing values in the “Age” column and ask a simple question:

“How do you handle these missing values before we start training?”

90% of candidates walk right into the trap.

The candidate immediately grabs the whiteboard marker.

“Easy. I’ll calculate the median of the ‘Age’ column to handle outliers, then fill the empty cells with that value.”

They might even write the Pandas equivalent:

𝘥𝘧[’𝘢𝘨𝘦’] = 𝘥𝘧[’𝘢𝘨𝘦’].𝘧𝘪𝘭𝘭𝘯𝘢(𝘥𝘧[’𝘢𝘨𝘦’].𝘮𝘦𝘥𝘪𝘢𝘯())

The interviewer nods, smiles, and ends the interview 5 minutes later. They didn’t get the job.

AI Interview Prep is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

User's avatar

Continue reading this post for free, courtesy of Hao Hoang.

Or purchase a paid subscription.
© 2026 Hao Hoang · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture