Reaction inferred from pseudogene regions when using gapseq on genome fasta file. #199

ArnaudBelcour · 2023-11-22T11:32:15Z

Hello,

Technical part

While working on the reconstruction of metabolic networks from public genomes of the NCBI database, I have found that gapseq (version 1.2 with subcommand doall) uses region of the genome that are tagged as pseudogene during reaction inference. I identified this by using the genome sequence as input to gapseq and by comparing (here by searching for overlap) the region predicted to be associated with a reaction to the genes present in the GenBank file of the organism at the same location. When the corresponding gene has a pseudo qualifier, I considered that the reaction was associated with a pseudogene.

There are a lot of variations, some species have no matches with pseudogene regions and other have hundreds of reactions associated with these regions.
It seems logical to find them, as pseudogene regions still contain some sequences similar to the ones of functional genes that can match when tblasting them. In my previous team, we encounter a similar issue when developing the method AuCoMe.

Do you think it could be possible to identify and label these reactions as associated with pseudogenes? Or at least put a warning when using genome sequence file as input?

Thoughts on pseudogenes and metabolism

For me this raises the question of whether taking into account these regions or not. Because, yes, they have been identified (often automatically) as pseudogene regions but these predictions could be taken with caution. Especially for two points:

the notion of pseudo-pseudogenes, where predicted pseudogenes show an activity. For example: (1) regulation activity (Pink et al. 2011), (2) translation activity (Prieto-Godino et al. 2016) or (3) protein expression (Feng et al. 2022).
the notion of protogenes, where genes can arise from genomic sequence and could be misinterpreted as pseudogenes (Carvunis et al. 2012)).

So I think it could be interesting to label these reaction as they can show (1) a loss of (or inactive) function, (2) a modification of this function but that can still be performed or (3) a future potential active function. But maybe they should not be present in the model that will be used to make prediction (such as with Flux Balance Analysis) due to the uncertainty about them?

Best regards,
Arnaud Belcour.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reaction inferred from pseudogene regions when using gapseq on genome fasta file. #199

Reaction inferred from pseudogene regions when using gapseq on genome fasta file. #199

ArnaudBelcour commented Nov 22, 2023

Reaction inferred from pseudogene regions when using gapseq on genome fasta file. #199

Reaction inferred from pseudogene regions when using gapseq on genome fasta file. #199

Comments

ArnaudBelcour commented Nov 22, 2023

Technical part

Thoughts on pseudogenes and metabolism