DocTOR

This Repository represents the effort to obtain a predictor for single proteins associated to Adverse Reactions (ADRs).

Galletti C, Aguirre-Plans J, Oliva B and Fernandez-Fuentes N (2022) Prediction of Adverse Drug Reaction Linked to Protein Targets Using Network-Based Information and Machine Learning. Front. Bioinform. 2:906644. doi: 10.3389/fbinf.2022.906644

DocTOR (Direct fOreCast Target On Reaction), is a utility written in python3.9 (using the conda workframe) that allows the user to upload a list of Uniprot IDs and Adverse reactions (from the available models) in order to study the relationship between the two.

On output the program will assign a positive or negative class to the protein, assessing its possible involvement in the selected ADRs onset.

DocTOR exploits the data coming from T-ARDIS [https://doi.org/10.1093/database/baab068] to train different Machine Learning approaches (SVM, RF, NN) using network topological measurements as features.

The prediction coming from the single trained models are combined in a meta-predictor exploiting three different voting systems.

Jury Voting System: One of the basic voting system, the output class will be simply the one with the highest amount of predictions.
(e.g. Given a protein X, if two ML methods predict the class as 1 - or “ADR linked” and just one as 0, the ensemble method output class will be 1)
Consensus Voting System: This approach will sum the single ML class probability multiplied by 1 or -1 based if the prediction is respectively an ADR-linked or not. If the final sum of these values is positive the ensemble method output will be 1 or “ADR-linked”, if negative, 0 or “Not-ADR-linked”.
(e.g. Given a protein X, if the SVM model predicts as class 1 with a probability of 0.86, RF predicts class 0 with a probability of 0.63 and the NN predicts as class 1 with a probability of 0.53, the overall class prediction will be equal to (0.86 * +1) + (0.63 * -1) + (0.53 * +1) = 0.76 ➡ class 1).
Red Flag Voting System: As opposed to the Jury Vote system, the Red Flag approach will select as output for the ensemble method the one with least predictions.
(e.g. Given a protein X, if two ML methods predict the class as 1 - or “ADR linked” and just one as 0, the ensemble method output class will be 0)

The results of the meta-predictor together with the ones from the single ML method will be available in the output log file (named "predictions_community" or "predictions_curated" based on the database type).

Usage

DocTOR has been developed in a conda environment, to avoid problems and for the best usage is suggested to use the yml file provided

Create the environment from the environment.yml file:
conda env create -f environment.yml
Activate the new environment:
conda activate DocTOR_env
Download the trained models from
Extract the various tar.gz files (The entire uncompressed folder is ~20GB)
tar -zxf *.tar.gz
Modify the adr_list and protein_list files with the Preferred Terms or SOC and Uniprot Ids of choice.
The first accepts the adverse reaction reported in the Available_models file, the second accepts Uniprot Ids separated by newline.

NB. It is suggested to run separately Preferred term and SOC search since the analysis uses different functions
Run the python script:
In case of Preferred Terms selection:
```
python3 predictor_all_in_one.py protein_list adr_list PT   
```
In case of System Organ Class selection:
```
python3 predictor_all_in_one.py protein_list adr_list SOC  
```
If the ADR or Uniprot ID are not present in the database the program will raise a warning
The code will give as output two files based on the database type predictions_community and predictions_controlled both of them contains the prediction result in a tab delimited format.
The file is divided in 15 columns:

BIANA ID : Protein code in the BIANA Network
UNIPROT_ID : Uniprot ID of the protein in analysis
adr : Adverse reaction in analysis
top bench method : meta-predictor method that obtained the best results in terms of Accuracy Precision Recall and Matthew Correlation Coefficient during the independent test analysis for this Adverse Event. While also presenting the other methods, it is suggested to take this result as a possible reference
jury_class : Attributed class based in the jury method (single ML method class result count)
vote_count : Number of votes for the class reported
red_flag : Class reported by the Red Flag method
consensus :Class reported by the Consensus method
Predicted class SVM : Class predicted by the Support Vector Machine
Predicted probability SVM: Probability associated to the SVM class
Predicted class RND: Class predicted by the Random Forest method
Predicted probability RND: Probability associated to the RF class
Predicted class NN: Class predicted by the Neural Network method
Predicted probability NN: Probability associated to the NN class
Sum_Prob : Corrected probabilities for the Consensus Method

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Available_models		Available_models
DocTOR_environment.yml		DocTOR_environment.yml
README.md		README.md
adr_list		adr_list
best_models_community		best_models_community
best_models_controlled		best_models_controlled
community_outdf.tar.gz		community_outdf.tar.gz
controlled_outdf.tar.gz		controlled_outdf.tar.gz
node_info_BIANA.txt		node_info_BIANA.txt
predictor_all_in_one.py		predictor_all_in_one.py
protein_list		protein_list

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DocTOR

Usage

About

Releases

Packages

Languages

cristian931/DocTOR

Folders and files

Latest commit

History

Repository files navigation

DocTOR

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages