LASeR: Learning to Adaptively Select Reward Models with Multi-Arm Bandits

Duy Nguyen*, Archiki Prasad*, Elias Stengel-Eskin, and Mohit Bansal (*equal contribution).

Installation

This project is built on Python 3.10.11. All dependencies can be installed via:

pip install -r requirements.txt

Project Directory Structure

The project directory is as follows:

scripts/
├── dataset/
    ├── strategyqa/
    ├── gsm8k/
    └── mmlu/
├── model/
    ├── __init__.py
    └── response_generator.py
├── utils/
    ├── __init__.py
    ├── config_loader.py
    ├── dataset_manager.py
    ├── linucb.py
    ├── llm_trainer.py
    ├── preference_pair_generator.py
    └── reward_model.py
├── config.yaml
├── train_and_infer.py
├── run_training.sh
└── run_training_all.sh

Running LASeR

Run LASeR on reasoning tasks

Run LASeR on one dataset, for example StrategyQA:

cd scripts
bash run_training.sh strategyqa

Run all datasets

cd scripts
bash run_training_all.sh

You can change the training setup in scripts/config.yaml

Run LASeR on instruction-following tasks

Coming soon

Run LASeR on long-context understanding tasks

Coming soon

Citation

@article{nguyen2024laser,
  title={LASeR: Learning to Adaptively Select Reward Models with Multi-Arm Bandits},
  author={Nguyen, Duy and Prasad, Archiki and and Stengel-Eskin, Elias and Bansal, Mohit},
  journal={arXiv preprint arXiv:2410.01735},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
scripts		scripts
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LASeR: Learning to Adaptively Select Reward Models with Multi-Arm Bandits

Installation

Project Directory Structure

Running LASeR

Run LASeR on reasoning tasks

Run LASeR on instruction-following tasks

Run LASeR on long-context understanding tasks

Citation

About

Releases

Packages

Languages

duykhuongnguyen/LASeR-MAB

Folders and files

Latest commit

History

Repository files navigation

LASeR: Learning to Adaptively Select Reward Models with Multi-Arm Bandits

Installation

Project Directory Structure

Running LASeR

Run LASeR on reasoning tasks

Run LASeR on instruction-following tasks

Run LASeR on long-context understanding tasks

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages