AI Interview Prep

AI Interview Prep

Computer Vision Interview Questions #9 – The Tiny Object Trap

Why Faster R-CNN still beats YOLO when defects are smaller than your receptive field.

Hao Hoang's avatar
Hao Hoang
Jan 10, 2026
∙ Paid

You’re in a Senior Computer Vision Engineer interview at Amazon Fulfillment Technologies & Robotics and the lead engineer asks:

“We need to detect tiny, 3mm micro-fractures on a fast-moving assembly line. You suggested 𝐅𝐚𝐬𝐭𝐞𝐫 𝐑-𝐂𝐍𝐍 over 𝐘𝐎𝐋𝐎. Why does the 𝐑𝐞𝐠𝐢𝐨𝐧 𝐏𝐫𝐨𝐩𝐨𝐬𝐚𝐥 𝐍𝐞𝐭𝐰𝐨𝐫𝐤 (𝐑𝐏𝐍) specifically help with small objects, even though it kills our inference speed?”

Don’t say: “Because two-stage detectors are generally more accurate than single-stage detectors.’”

This is an obvious statement, not an engineering justification. It tells the interviewer you know the reputation of the models, but not the mechanics of why they work.

The reality is that detecting small objects isn’t just a 𝘳𝘦𝘴𝘰𝘭𝘶𝘵𝘪𝘰𝘯 problem, it is a 𝐂𝐥𝐚𝐬𝐬 𝐈𝐦𝐛𝐚𝐥𝐚𝐧𝐜𝐞 problem.

In a typical manufacturing image, 99.9% of the pixels are “background” (the conveyor belt) and 0.1% are the “defect.”

If you use a 𝐒𝐢𝐧𝐠𝐥𝐞-𝐒𝐭𝐚𝐠𝐞 𝐝𝐞𝐭𝐞𝐜𝐭𝐨𝐫 (like standard YOLO or SSD):

- You are forcing the network to classify thousands of dense grid anchors in one pass.

- The overwhelming signal from the “easy background” drowns out the weak signal from the tiny defect.

- It’s like trying to find a needle in a haystack by scanning the whole stack with a satellite.

𝘞𝘩𝘺 𝘵𝘩𝘦 𝘙𝘗𝘕 (𝘙𝘦𝘨𝘪𝘰𝘯 𝘗𝘳𝘰𝘱𝘰𝘴𝘢𝘭 𝘕𝘦𝘵𝘸𝘰𝘳𝘬) 𝘸𝘪𝘯𝘴:

AI Interview Prep is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

User's avatar

Continue reading this post for free, courtesy of Hao Hoang.

Or purchase a paid subscription.
© 2026 Hao Hoang · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture