Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gapfilling fails in step 1 #232

Open
cdiener opened this issue Aug 28, 2024 · 10 comments
Open

Gapfilling fails in step 1 #232

cdiener opened this issue Aug 28, 2024 · 10 comments
Assignees

Comments

@cdiener
Copy link

cdiener commented Aug 28, 2024

Related to #228, opening a new bug since I can't reopen the old one and this is a new example.

We are still seeing failures in gapfilling for Archaea with version 1.3.1. I attached an example from MGnify here.

Environment

From here.

gapseq version: 1.3.1
Sequence DB md5sum: 17e92c9 (2023-12-12, Bacteria)
Sequence DB md5sum: 8bb9575 (2023-12-12, Archaea)

Commands

We use the gut medium bundled with gapseq. The script we run:

#!/bin/bash -ue
cp /home/isilon/users/o_diener/micom_dbs/recipes/uhgg/data/gut.csv medium.csv
gapseq fill -m MGYG000000522-draft.RDS -n medium.csv -c MGYG000000522-rxnWeights.RDS -b 100 -g MGYG000000522-rxnXgenes.RDS > MGYG000000522.log
gzip MGYG000000522.xml

Which will result in:

INFO [2024-08-27 15:30:20] Set BLAS to 1 thread.                                                                                         
Loading required package: glpkAPI                                                                                                        
using GLPK version 5.0                                                                                                                   
Warning message:                                                                                                                         
glpkAPI is used but cplexAPI is recommended because it is much faster                                                                    
Warning in gapfill4(mod.orig = mod.orig, mod.full = mod, rxn.weights = copy(rxn.weights),  :                                             
  Final model MGYG000000522 cannot produce enough target even when all candidate reactions are added! obj=0 lp_stat=5                    
Wrote file ./MGYG000000522.xml

MRP

See the input files here: uhgg_example.zip

Repository owner deleted a comment Aug 28, 2024
@Waschina Waschina self-assigned this Aug 28, 2024
Repository owner deleted a comment Aug 28, 2024
Waschina added a commit that referenced this issue Aug 28, 2024
- Option for minimum required growth rate was already part of the "gapseq fill" module.
- However, growth rate was not always achieved, since it was not added
as constraint in gapfill step 1.
- refers to #232
@Waschina
Copy link
Collaborator

Thanks again for posting the issue.

I tried to find the underlying problem but am not entirely there yet. What I found so far:

  • the issue is unrelated to Gapfilling occasionally fails silently with GLPK #228 in the sense that here, the issue occurs also with cplex as the solver.
  • In the MRP, an infeasible flux distribution is predicted in the first gapfilling step, where all reactions from the biochemistry database are considered. However, both GLPK and CPLEX returned the solution as feasible. When reducing the metabolic network to those reactions predicted to carry a non-zero flux, the FBA predicts that no biomass formation is possible.
  • Unrelated to that, I noticed that we have an option "--min.obj.val" in the gapseq fill module to set a minimum growth rate that gap-filling should achieve; yet, it was not enforced, and models with a growth rate less than the specified value are commonly returned. I fixed this with commit fb7fbcb. The example with MGYG000000522 now works, but the issue from the previous point remains.

@cdiener
Copy link
Author

cdiener commented Aug 29, 2024

Do you mean the initial MIP is feasible but then the model with non-zero indicators is infeasible?
We had similar issues with the gapfilling in cobrapy and what helped is to lower the integrality tolerance because the default will sometimes allow flux through zero indicator reactions otherwise. That happened with GLPK and CPLEX to us as well. Gurobi and HIGHS seem to be ab it less sensitive there.

@Waschina
Copy link
Collaborator

Do you mean the initial MIP is feasible but then the model with non-zero indicators is infeasible?

Yes, just that in gapseq the initial problem is formulated as LP not as MIP (Eq. 1 in the gapseq publication).

We had similar issues with the gapfilling in cobrapy and what helped is to lower the integrality tolerance because the default will sometimes allow flux through zero indicator reactions otherwise. That happened with GLPK and CPLEX to us as well. Gurobi and HIGHS seem to be ab it less sensitive there.

That is good to know; thank you! I will try to lower tolerance in glpk and cplex.

@cdiener
Copy link
Author

cdiener commented Aug 29, 2024

Oh sorry, never mind then. Integrality tolerance has no effects on LPs, obviously.

@Waschina
Copy link
Collaborator

But it was a good hint – there is a simplex parameter in GLPK that controls tolerance for bound violations. I guess cplex has this, too. A first quick test with GLPK indicates that this value might need to be reduced for the initial LP. I'll do some further tests.

Waschina added a commit that referenced this issue Aug 30, 2024
- Code was slightly modified to make sure that the initial gap-fill pFBA
solution is actually a feasible solution.
- In case infeasible solutions are returned, the LP is solved again with
lowered bound violation tolerance (refers to #232).

Some further changes:
- During gapfilling, the reported numbers of added reactions now does
not count added exchange reactions
- more conistent checking of a returned FBA solution is actually
feasible
@Waschina
Copy link
Collaborator

Waschina commented Aug 30, 2024

The gapfilling algorithm was updated on the cobrar branch. The key was indeed to reduce the bound violation tolerance in cases where an optimal solution returned by the initial pFBA was not feasible and then re-run the optimization.
The CPLEX documentation also recommends this procedure:

You can also lower this tolerance after finding an optimal solution if there is any doubt that the solution is truly optimal.

"cobrar" needs to be updated to the current development version to work with the latest version of gapseq from the cobrar branch,

I will run a larger set of reconstructions with the new updates to see if something unexpected happens.

@cdiener
Copy link
Author

cdiener commented Sep 2, 2024

Okay that makes sense. When we had similar problems in the CORDA port (somewhat similar idea with a linearized cost optimization) another gotcha was the threshold for the absolute flux value (`|v| > eps -> include). But it looks like you are already checking that by removing the reaction and ensuring the objective is maintained.

@cdiener
Copy link
Author

cdiener commented Sep 4, 2024

Just as an FYI. For me, the provided example will still fail gapfilling silently (final growth rate of 0) evne when updating to the latest master branch.

@Waschina
Copy link
Collaborator

Waschina commented Sep 4, 2024

Yes, the master branch still relies on the sybil packages. Since the fix involved quite some changes in the gapfilling algorithm, I just made the changes to the "cobrar" branch of gapseq, which will be merged into the master branch hopefully soon.

@cdiener
Copy link
Author

cdiener commented Sep 6, 2024

Sorry I meant in relation to:

Unrelated to that, I noticed that we have an option "--min.obj.val" in the gapseq fill module to set a minimum growth rate that gap-filling should achieve; yet, it was not enforced, and models with a growth rate less than the specified value are commonly returned. I fixed this with commit fb7fbcb. The example with MGYG000000522 now works, but the issue from the previous point remains.

Even with commit fb7f MGYG000000522 did not work for me.

I will rebuild the image with the cobrar branch and test that one a bit more. Thanks for all the work on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants