AI Interview Prep

AI Interview Prep

Computer Vision Interview Questions #25 โ€“ The Contrastive Shortcut Trap

Why CLIP learns just enough to pass, and how a generative decoder forces the encoder to stop being lazy.

Hao Hoang's avatar
Hao Hoang
Jan 26, 2026
โˆ™ Paid

Youโ€™re in a Computer Vision interview at OpenAI. The interviewer sets a trap:

โ€œWe are building a ๐˜ก๐˜ฆ๐˜ณ๐˜ฐ-๐˜š๐˜ฉ๐˜ฐ๐˜ต ๐˜Š๐˜ญ๐˜ข๐˜ด๐˜ด๐˜ช๐˜ง๐˜ช๐˜ฆ๐˜ณ. We have the budget for a standard CLIP architecture. Why should we burn 25% more VRAM adding a ๐˜Ž๐˜ฆ๐˜ฏ๐˜ฆ๐˜ณ๐˜ข๐˜ต๐˜ช๐˜ท๐˜ฆ ๐˜‹๐˜ฆ๐˜ค๐˜ฐ๐˜ฅ๐˜ฆ๐˜ณ (๐˜Š๐˜ฐ๐˜Š๐˜ข) if we donโ€™t need to generate captions?โ€

90% of candidates walk right into it.

The candidates say: โ€œYou add the decoder for Multi-Task Learning. It allows the model to handle captioning tasks if business requirements change later.โ€

The interviewer nods politely, makes a note, and the candidates never hear back. Why? Because they treated the architecture as a feature list, not a representation engine.

They arenโ€™t optimizing for ๐˜ท๐˜ฆ๐˜ณ๐˜ด๐˜ข๐˜ต๐˜ช๐˜ญ๐˜ช๐˜ต๐˜บ. You are optimizing for signal ๐˜ฅ๐˜ฆ๐˜ฏ๐˜ด๐˜ช๐˜ต๐˜บ.

๐˜Š๐˜ฐ๐˜ฏ๐˜ต๐˜ณ๐˜ข๐˜ด๐˜ต๐˜ช๐˜ท๐˜ฆ ๐˜“๐˜ฐ๐˜ด๐˜ด (the mechanism behind CLIP) is inherently โ€œlazy.โ€ It is a global โ€œvibe check.โ€ To minimize loss, the model only needs to learn the minimum features necessary to distinguish a โ€œDogโ€ from a โ€œTableโ€ in the current batch.

It discards fine-grained details, texture, exact count, spatial relation, because it doesnโ€™t need them to satisfy the contrastive objective.

-----
๐“๐ก๐ž ๐’๐จ๐ฅ๐ฎ๐ญ๐ข๐จ๐ง: The real reason to add a decoder is to enforce ๐“๐ก๐ž โ€œ๐†๐ซ๐š๐ง๐ฎ๐ฅ๐š๐ซ๐ข๐ญ๐ฒ ๐“๐š๐ฑโ€.

AI Interview Prep is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Keep reading with a 7-day free trial

Subscribe to AI Interview Prep to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
ยฉ 2026 Hao Hoang ยท Privacy โˆ™ Terms โˆ™ Collection notice
Start your SubstackGet the app
Substack is the home for great culture