Duy Nguyen*, Archiki Prasad*, Elias Stengel-Eskin, and Mohit Bansal (*equal contribution).
This project is built on Python 3.10.11. All dependencies can be installed via:
pip install -r requirements.txt
The project directory is as follows:
scripts/
├── dataset/
├── strategyqa/
├── gsm8k/
└── mmlu/
├── model/
├── __init__.py
└── response_generator.py
├── utils/
├── __init__.py
├── config_loader.py
├── dataset_manager.py
├── linucb.py
├── llm_trainer.py
├── preference_pair_generator.py
└── reward_model.py
├── config.yaml
├── train_and_infer.py
├── run_training.sh
└── run_training_all.sh
Run LASeR on one dataset, for example StrategyQA:
cd scripts
bash run_training.sh strategyqa
Run all datasets
cd scripts
bash run_training_all.sh
You can change the training setup in scripts/config.yaml
Coming soon
Coming soon
@article{nguyen2024laser,
title={LASeR: Learning to Adaptively Select Reward Models with Multi-Arm Bandits},
author={Nguyen, Duy and Prasad, Archiki and and Stengel-Eskin, Elias and Bansal, Mohit},
journal={arXiv preprint arXiv:2410.01735},
year={2024}
}