Machine Learning System Design Interview #37 - The Uncertainty Loop Paradox
The hidden trap where calculating predictive entropy on a 70B model completely destroys cluster efficiency, and the lightweight proxy scoring trick that solves it.
You’re in a Senior AI Engineer interview at Meta. The interviewer sets a trap:
“We have a 10-million sample unlabelled dataset and want to fine-tune a Llama-3 70B model while minimizing manual annotation costs. How do you design an Active Learning loop to selectively label the most uncertain points?”
95% of candidates walk right into it.
They immediately suggest: “We deploy an iterative active learning pipeline. We run the unlabelled data through the LLM, compute token-level predictive entropy to find the highest uncertainty examples, send those to human annotators, append them to the training set, and kick off the next fine-tuning epoch.”
They just failed.
𝐓𝐡𝐞 𝐑𝐞𝐚𝐥𝐢𝐭𝐲:
Applying textbook Active Learning to frontier LLMs is a financial suicide mission. You are trading cheap human labeling costs for a crippling H100 compute bill.


