Discussion about this post

User's avatar
Hao Hoang's avatar

📚 Related Papers:

- DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model. Available at: https://arxiv.org/abs/2405.04434

- DeepSeek-V3 Technical Report. Available at: https://arxiv.org/abs/2412.19437

- GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints. Available at: https://arxiv.org/abs/2305.13245

No posts

Ready for more?