GenomicLLM

Code for paper Exploring Genomic Large Language Models: Bridging the Gap between Natural Language and Gene Sequences.

Requirements

matplotlib
numpy
pandas
biopython==1.79
rouge==1.0.1
tokenizers==0.11.6
torch==2.0.1+cu117
torchaudio==2.0.2+cu117
torchvision==0.15.2+cu117
transformers==4.18.0

Materials

Download data sets and the trained models from https://zenodo.org/records/10695802

Training：

Set all the parameters in configurator.py before training:

CUDA_VISIBLE_DEVICES=0 python train.py

Test：

Test the overall test set:

CUDA_VISIBLE_DEVICES=0 python test.py --custom=False --BeckyGRCh38Data_num_samples=-1 --GUEData_num_samples=-1 --BeckyData_num_samples=-1 --HyenaData_num_samples=-1

Test part of the test set:

CUDA_VISIBLE_DEVICES=0 python test.py --custom=False --BeckyGRCh38Data_num_samples=-1 --BeckyGRCh38_data_name=['enhancer', 'splice site']

Test a custom data set:

CUDA_VISIBLE_DEVICES=0 python test.py --custom=True --file_name='./data/Genomic/custom_data/human_enhancers_cohn_test.txt'

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.ipynb_checkpoints		.ipynb_checkpoints
__pycache__		__pycache__
out		out
.gitignore		.gitignore
README.md		README.md
configurator.py		configurator.py
data_utils.py		data_utils.py
get_metrics.ipynb		get_metrics.ipynb
model.py		model.py
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GenomicLLM

Requirements

Materials

Training：

Test：

About

Releases

Packages

Languages

Huatsing-Lau/GenomicLLM

Folders and files

Latest commit

History

Repository files navigation

GenomicLLM

Requirements

Materials

Training：

Test：

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages