Computer Vision Interview Questions #14 – The Attention vs MLP Responsibility Trap
Why attention handles communication, but MLPs do the real computation in modern vision transformers.
You’re in a AI Researcher interview at OpenAI and the interviewer asks:
“We know 𝘚𝘦𝘭𝘧-𝘈𝘵𝘵𝘦𝘯𝘵𝘪𝘰𝘯 handles the context between tokens. So, why do we burn ~60% of our parameter budget on the 𝘗𝘰𝘴𝘪𝘵𝘪𝘰𝘯-𝘸𝘪𝘴𝘦 𝘔𝘓𝘗 𝘭𝘢𝘺𝘦𝘳𝘴? What is the MLP actually doing?”
Most of candidates say: “It adds non-linearity and more parameters so the model can learn complex functions.”
It’s technically true but architecturally lazy. It treats the MLP as generic “muscle” without understanding its specific role in the signal processing pipeline.
To pass the interview, you need to explain the separation of duties between 𝐂𝐨𝐦𝐦𝐮𝐧𝐢𝐜𝐚𝐭𝐢𝐨𝐧 and 𝐂𝐨𝐦𝐩𝐮𝐭𝐚𝐭𝐢𝐨𝐧.
The reality is that 𝘚𝘦𝘭𝘧-𝘈𝘵𝘵𝘦𝘯𝘵𝘪𝘰𝘯 is just a fancy weighted average. It moves information between tokens, but it doesn’t really process that information.
𝐇𝐞𝐫𝐞 𝐢𝐬 𝐭𝐡𝐞 𝐛𝐫𝐞𝐚𝐤𝐝𝐨𝐰𝐧:
Keep reading with a 7-day free trial
Subscribe to AI Interview Prep to keep reading this post and get 7 days of free access to the full post archives.

