LLM Agents Interview Questions #7 - The DOM Context Trap
Feeding the full accessibility tree into an LLM creates a state-space explosion where invisible artifacts dominate attention and latency scales with structural noise, not task complexity.
You’re in a Senior AI Engineer interview at OpenAI and the interviewer asks:
“You’ve built a web navigation agent using the full accessibility tree (DOM) to maximize contextual awareness. But latency is spiking, and it’s completely failing on complex web apps. What is the hidden architectural bottleneck here, and why is pure-vision grounding the superior production choice?”
Most candidates say: “The DOM is too long for the LLM’s context window, so we just need to use a better parsing script or apply RAG to chunk the HTML.”
Wrong approach. That’s a bandage, not a cure.
The reality of production web agents: The DOM is a trap. It’s not a true representation of the user experience, it’s just a messy engineering artifact.
Relying on the DOM for UI agents is like trying to drive a car by reading the engine’s raw diagnostic logs instead of just looking out the windshield.
Here is what is actually breaking your agent:
Keep reading with a 7-day free trial
Subscribe to AI Interview Prep to keep reading this post and get 7 days of free access to the full post archives.

