Advanced NLP Interview Questions #23 – The Curriculum Learning Trap
Why shuffling General, Code, and Math data together silently caps reasoning performance and how staged pretraining unlocks true chain-of-thought.
You’re in a Staff Research Scientist interview at DeepSeek AI and the interviewer asks:
“We have three massive datasets: 𝘎𝘦𝘯𝘦𝘳𝘢𝘭 𝘛𝘦𝘹𝘵, 𝘚𝘰𝘶𝘳𝘤𝘦 𝘊𝘰𝘥𝘦, and 𝘴𝘱𝘦𝘤𝘪𝘢𝘭𝘪𝘻𝘦𝘥 𝘔𝘢𝘵𝘩 𝘱𝘳𝘰𝘣𝘭𝘦𝘮𝘴. To build a State-of-the-Art Math reasoner, in what order do you feed this data during pre-training, and why?”
Most candidates say: “Just shuffle them all together into one big dataset to avoid catastrophic forgetting.”
This answer works for general-purpose chatbots, but it caps their ceiling for complex reasoning tasks.
The reality is that data composition is a curriculum, not a soup.
DeepSeek Math experiments proved that a simple “𝐌𝐢𝐱-𝐚𝐥𝐥-𝐚𝐭-𝐨𝐧𝐜𝐞” strategy is suboptimal. The winning formula is a specific multi-stage pipeline: 𝐆𝐞𝐧𝐞𝐫𝐚𝐥 𝐓𝐞𝐱𝐭 → 𝐂𝐨𝐝𝐞 → 𝐌𝐚𝐭𝐡.
Here is the senior-level logic you need to explain:
Keep reading with a 7-day free trial
Subscribe to AI Interview Prep to keep reading this post and get 7 days of free access to the full post archives.

