Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reaction inferred from pseudogene regions when using gapseq on genome fasta file. #199

Open
ArnaudBelcour opened this issue Nov 22, 2023 · 0 comments

Comments

@ArnaudBelcour
Copy link
Contributor

Hello,

Technical part

While working on the reconstruction of metabolic networks from public genomes of the NCBI database, I have found that gapseq (version 1.2 with subcommand doall) uses region of the genome that are tagged as pseudogene during reaction inference. I identified this by using the genome sequence as input to gapseq and by comparing (here by searching for overlap) the region predicted to be associated with a reaction to the genes present in the GenBank file of the organism at the same location. When the corresponding gene has a pseudo qualifier, I considered that the reaction was associated with a pseudogene.

There are a lot of variations, some species have no matches with pseudogene regions and other have hundreds of reactions associated with these regions.
It seems logical to find them, as pseudogene regions still contain some sequences similar to the ones of functional genes that can match when tblasting them. In my previous team, we encounter a similar issue when developing the method AuCoMe.

Do you think it could be possible to identify and label these reactions as associated with pseudogenes? Or at least put a warning when using genome sequence file as input?

Thoughts on pseudogenes and metabolism

For me this raises the question of whether taking into account these regions or not. Because, yes, they have been identified (often automatically) as pseudogene regions but these predictions could be taken with caution. Especially for two points:

So I think it could be interesting to label these reaction as they can show (1) a loss of (or inactive) function, (2) a modification of this function but that can still be performed or (3) a future potential active function. But maybe they should not be present in the model that will be used to make prediction (such as with Flux Balance Analysis) due to the uncertainty about them?

Best regards,
Arnaud Belcour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant