Discussion about this post

User's avatar
Hao Hoang's avatar

📚 Related Papers:

- Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity. Available at: https://arxiv.org/abs/2101.03961

- GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding. Available at: https://arxiv.org/abs/2006.16668

- MegaBlocks: Efficient Sparse Training with Mixture-of-Experts. Available at: https://arxiv.org/abs/2211.15841

No posts

Ready for more?