GitHub - LLYX/DETECT-probability-profile-generation-pipeline: Pipeline designed to generate probability profiles for use by DETECT in Scinet.

Basic Usage

Simply place this folder into Scinet, place raw dat file of all proteins to process and a fasta file of all proteins that meet your criteria, generate a blast db based on your list of proteins, and run the bash script "0_reset_for_round_n -n", where -n is the number of individual jobs to split into (currently set to optimally use some multiple of 8). This will start the pipeline, and should result in two files, one for positive and oen for negative densities, per viable EC.

Preprocessing

0_prepare_sequence_data: Used to filter out proteins which do not belong to a viable class for analysis in DETECT from dat file into fasta file.

0_make_blast_db: Used to generate the blast db based on list of proteins from fasta generated from above.

Postprocessing

0_create_mappings_and_prior_probabilities_file: Create two files containing the mappings of sequence IDs to EC, and prior probabilities used for the Bayesian estimation of DETECT, which will be required for DETECT to function.

Warning

Some filenames may have to be changed, such as whenever a reference to the dat or fasta files are made. These filenames will come from external sources and are not generated exactly the same way automatically. The path to the EMBOSS package will also have to be changed to one that is available to you. Furthermore, please ensure that you have installed all the necessary packages (those that are imported in the bash script headers). Finally, this pipeline is designed to run off of the Scinet cluster; usage on other clusters will require further modifications.

For more information, bug reports, or otherwise, please contact: leon.xu@mail.utoronto.ca

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Local		Local
0_create_and_queue_all_master_pipelines.sh		0_create_and_queue_all_master_pipelines.sh
0_create_mappings_and_prior_probabilities_file.py		0_create_mappings_and_prior_probabilities_file.py
0_create_run_master_pipelines.py		0_create_run_master_pipelines.py
0_create_run_pipeline_forall_ec.py		0_create_run_pipeline_forall_ec.py
0_get_eligible_ecs.py		0_get_eligible_ecs.py
0_make_blast_db.sh		0_make_blast_db.sh
0_prepare_sequence_data.py		0_prepare_sequence_data.py
0_reset_for_round_n.sh		0_reset_for_round_n.sh
0_run_pipeline_for_one_ec.sh		0_run_pipeline_for_one_ec.sh
1_create_fasta_with_target_ec_from_dat.py		1_create_fasta_with_target_ec_from_dat.py
2_blast_single_ec.sh		2_blast_single_ec.sh
3_get_blast_results_unique_pairs.py		3_get_blast_results_unique_pairs.py
4_needleall_one_ec.py		4_needleall_one_ec.py
5_format_needleall_one_ec.py		5_format_needleall_one_ec.py
6_generate_probability_profile_for_one_ec.py		6_generate_probability_profile_for_one_ec.py
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

License

LLYX/DETECT-probability-profile-generation-pipeline

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages