PEER - Mixture of A Million Experts

To do:

Complete the overview distributed training on wikitext-103
Reproduce the results on wikitext-103 (comparing on dense model and MoE)
Implement the model on other datasets
Pre-training 1.5B model on 2024 subset FineWeb

Implementation of paper Mixture of A Million Experts by Phan Nhat Huy

How to run

torchrun --nproc_per_node=N --nnodes=1 main.py

Training Process

Wikitext-103 2.2B model, 8 layers, 8 head, dimension = 256, 512x512 experts.

Results Overview

Validation Perplexity

Method	Wikitext-103 Perplexity
PEER	7.19
FFW	on-going

Citations

@inproceedings{He2024MixtureOA,
    title   = {Mixture of A Million Experts},
    author  = {Xu Owen He},
    year    = {2024},
    url     = {https://api.semanticscholar.org/CorpusID:271038610}
}

Acknowledgements

I thank the implementation of PEER layer from lucidrains https://github.com/lucidrains/PEER-pytorch

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
peer		peer
README.md		README.md
main.py		main.py
peer_arch.png		peer_arch.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PEER - Mixture of A Million Experts

To do:

How to run

Training Process

Results Overview

Citations

Acknowledgements

About

Releases

Packages

Languages

huyphan168/PEER

Folders and files

Latest commit

History

Repository files navigation

PEER - Mixture of A Million Experts

To do:

How to run

Training Process

Results Overview

Citations

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages