AI Interview Prep

AI Interview Prep

LLM Agents Interview Questions #7 - The DOM Context Trap

Feeding the full accessibility tree into an LLM creates a state-space explosion where invisible artifacts dominate attention and latency scales with structural noise, not task complexity.

Hao Hoang's avatar
Hao Hoang
Mar 01, 2026
∙ Paid

You’re in a Senior AI Engineer interview at OpenAI and the interviewer asks:

“You’ve built a web navigation agent using the full accessibility tree (DOM) to maximize contextual awareness. But latency is spiking, and it’s completely failing on complex web apps. What is the hidden architectural bottleneck here, and why is pure-vision grounding the superior production choice?”

Most candidates say: “The DOM is too long for the LLM’s context window, so we just need to use a better parsing script or apply RAG to chunk the HTML.”

Wrong approach. That’s a bandage, not a cure.

The reality of production web agents: The DOM is a trap. It’s not a true representation of the user experience, it’s just a messy engineering artifact.

Relying on the DOM for UI agents is like trying to drive a car by reading the engine’s raw diagnostic logs instead of just looking out the windshield.

Here is what is actually breaking your agent:

AI Interview Prep is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Keep reading with a 7-day free trial

Subscribe to AI Interview Prep to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2026 Hao Hoang · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture