Skip to content

huyphan168/PEER

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

PEER - Mixture of A Million Experts

To do:

  • Complete the overview distributed training on wikitext-103
  • Reproduce the results on wikitext-103 (comparing on dense model and MoE)
  • Implement the model on other datasets
  • Pre-training 1.5B model on 2024 subset FineWeb

Implementation of paper Mixture of A Million Experts by Phan Nhat Huy

How to run

torchrun --nproc_per_node=N --nnodes=1 main.py

Training Process

Wikitext-103 2.2B model, 8 layers, 8 head, dimension = 256, 512x512 experts.

image

Results Overview

Validation Perplexity

Method Wikitext-103 Perplexity
PEER 7.19
FFW on-going

Citations

@inproceedings{He2024MixtureOA,
    title   = {Mixture of A Million Experts},
    author  = {Xu Owen He},
    year    = {2024},
    url     = {https://api.semanticscholar.org/CorpusID:271038610}
}

Acknowledgements

I thank the implementation of PEER layer from lucidrains https://github.com/lucidrains/PEER-pytorch

About

Mixture of A Million Experts

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages