Discussion about this post

User's avatar
Hao Hoang's avatar

📚 Related Papers:

- Scaling Laws for Neural Language Models. Available at: https://arxiv.org/abs/2001.08361

- DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining. Available at: https://arxiv.org/abs/2305.10429

- Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance. Available at: https://arxiv.org/abs/2403.16952

- BiMix: Bivariate Data Mixing Law for Language Model Pretraining. Available at: https://arxiv.org/abs/2405.14908

No posts

Ready for more?