Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault when iterating CDX file from USPTO downloads #1126

Closed
eloyfelix opened this issue May 4, 2023 · 9 comments · Fixed by #1130
Closed

Segfault when iterating CDX file from USPTO downloads #1126

eloyfelix opened this issue May 4, 2023 · 9 comments · Fixed by #1130
Assignees
Labels

Comments

@eloyfelix
Copy link
Contributor

Steps to Reproduce
Indigo 1.10.0 installed via pip install epam.indigo
Python 3.10.10
Ubuntu linux

from indigo import Indigo

indigo = Indigo()

for item in indigo.iterateCDXFile("US06174985-20010116-C00003.CDX"):
    print(item.molfile())

  -INDIGO-05042321392D

 15 14  0  0  0  0  0  0  0  0999 V2000
    0.5133    0.3458    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.5133    0.9048    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.0495    1.0778    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
    1.3776    0.6262    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.0495    0.1728    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
    1.9421    0.6262    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.2243    0.1375    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.2243    1.1148    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.0495    1.6422    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.6140    1.6422    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    0.4851    1.6422    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    2.5929    1.6581    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.8751    1.1694    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.8751    2.1467    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  2  0  0  0  0
  2  3  1  0  0  0  0
  3  4  1  0  0  0  0
  4  5  2  0  0  0  0
  1  5  1  0  0  0  0
  4  6  1  0  0  0  0
  6  7  1  0  0  0  0
  6  8  1  0  0  0  0
  3  9  1  0  0  0  0
  9 10  1  0  0  0  0
  9 11  2  0  0  0  0
 10 12  1  0  0  0  0
 12 13  1  0  0  0  0
 12 14  1  0  0  0  0
M  END


  -INDIGO-05042321392D

 17 18  0  0  0  0  0  0  0  0999 V2000
    3.8647    0.3458    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.8647    0.9048    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.4009    1.0778    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
    4.7290    0.6262    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.4009    0.1728    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
    5.2935    0.6262    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.5757    0.1375    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.5757    1.1148    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.4009    1.6422    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.8276    1.6422    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    4.9654    1.6422    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    5.9884    1.6422    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    6.2706    2.1361    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    6.8333    2.1361    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    7.1137    1.6422    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    6.8333    1.1694    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    6.2706    1.1694    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  2  0  0  0  0
  2  3  1  0  0  0  0
  3  4  1  0  0  0  0
  4  5  2  0  0  0  0
  1  5  1  0  0  0  0
  4  6  1  0  0  0  0
  6  7  1  0  0  0  0
  6  8  1  0  0  0  0
  3  9  1  0  0  0  0
  9 10  2  0  0  0  0
  9 11  1  0  0  0  0
 11 12  1  0  0  0  0
 12 13  1  0  0  0  0
 13 14  2  0  0  0  0
 14 15  1  0  0  0  0
 15 16  2  0  0  0  0
 16 17  1  0  0  0  0
 12 17  2  0  0  0  0
M  END


  -INDIGO-05042321392D

  0  0  0  0  0  0  0  0  0  0999 V2000
M  END


  -INDIGO-05042321392D

 23 22  0  0  0  0  0  0  0  0999 V2000
    1.4235    2.7535    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.4235    3.3179    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.9597    3.4907    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
    2.2878    3.0339    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.9597    2.5805    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
    2.8523    3.0339    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.9597    4.0552    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.5242    4.0552    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    1.3953    4.0552    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    3.1345    2.5452    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.1345    3.5225    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.0886    4.0552    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.8276    4.0552    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.1063    4.5526    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.6691    4.5526    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.9495    4.0552    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.6691    3.5701    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.1063    3.5701    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  2  0  0  0  0
  2  3  1  0  0  0  0
  3  4  1  0  0  0  0
  4  5  2  0  0  0  0
  1  5  1  0  0  0  0
  4  6  1  0  0  0  0
  3  7  1  0  0  0  0
  7  8  1  0  0  0  0
  7  9  2  0  0  0  0
  6 10  1  0  0  0  0
  6 11  1  0  0  0  0
  8 12  1  0  0  0  0
 12 13  1  0  0  0  0
 13 14  2  0  0  0  0
 14 15  1  0  0  0  0
 15 16  2  0  0  0  0
 16 17  1  0  0  0  0
 17 18  2  0  0  0  0
 13 18  1  0  0  0  0
 19 20  2  0  0  0  0
 19 21  1  0  0  0  0
 22 23  1  0  0  0  0
M  CHG  1  19   1
M  END

Segmentation fault (core dumped)

Expected behavior
Normal iteration without segfault

Actual behavior
Segfault

Attachments
US06174985-20010116-C00003.zip

Indigo version
1.10.0

Additional context
The CDX file comes packed in the following USPTO download file: https://bulkdata.uspto.gov/data/patent/grant/redbook/2001/20010116.ZIP this seems to be happening with other CDX files.

@eloyfelix eloyfelix added the Bug label May 4, 2023
@even1024 even1024 self-assigned this May 4, 2023
@even1024 even1024 added this to the Indigo-1.12.0-rc.1 milestone May 4, 2023
@even1024
Copy link
Collaborator

Please check it with Indigo 1.11.0. I wasn't able to reproduce the issue but I did investigation with valgrind and it indicates memory access issues during the loading of cdx from your example. These issues doesn't appear on 1.11.0.

@eloyfelix
Copy link
Contributor Author

latest indigo pip package version and github release is 1.10.0 https://pypi.org/project/epam.indigo/ https://github.com/epam/Indigo/releases

shall I then build it from 1.11.0 branch?
Could it be possible to have the pip package updated?

Many thanks!

@even1024
Copy link
Collaborator

It seems that 1.11.0-rc1 was published here: https://github.com/epam/Indigo/actions/runs/4731689107/jobs/8397946421

@eloyfelix
Copy link
Contributor Author

That's great, I didn't see it. I'll check that now, thanks!

@even1024
Copy link
Collaborator

Please let me know if everything is ok.

@eloyfelix
Copy link
Contributor Author

eloyfelix commented May 10, 2023

It still crashes for me. Some times it manages to parse the file and convert it to molfile but that might be 1 in 10. If I convert to smiles it seems to always crash now but this might be a completely random behaviour.

from indigo import Indigo

indigo = Indigo()

for item in indigo.iterateCDXFile("US06174985-20010116-C00003.CDX"):
    print(item.smiles())
C1N=C(C(C)C)N(C(=O)OC(C)C)C=1.C
C1N=C(C(C)C)N(C(OC2=CC=CC=C2)=O)C=1

C1N=C(C(C)C)N(C(=O)OCC2C=CC=CC=2)C=1.[NH+]([OH2+255])=O.OC
Segmentation fault (core dumped)

@eloyfelix
Copy link
Contributor Author

I can run the code now without problems by using Python 3.11 ( I was using 3.10)

Attached a Dockerfile with Python 3.10 that reproduces the issue. Container dies and stops dying if changing the Python version in the FROM statement.
I'm happy to use 3.11 so please feel free to close the bug report!

indigo_bug.zip

@eloyfelix
Copy link
Contributor Author

Sorry, it seems I'm still finding issues in Python 3.11
New file CDX attached

US06171768-20010109-C00056.zip

from indigo import Indigo

indigo = Indigo()

for i in range (1, 1000):
    for item in indigo.iterateCDXFile("US06171768-20010109-C00056.CDX"):
        print(item.smiles())
C1C=CC=C(N2N(CC3=CC=CC=C3)C(=O)NC2=O)C=1.CC(N)=O.CC(C=O)(C)C.C1=C(Cl)C=CC=C1.C(O)=O.CC(OCCCCCCCCCCCC)=O.CCCC
C12C=CC=CC=1C(OC1=CC=C(N=NC3C=CC4C=CC=CC=4C=3O)C=C1)=CC=C2O.C(NCCCCO)=O.C1=CC=CC=C1.[Na]OOOS.NC(C)=O.S(O[Na])(=O)=O
C1C=CC=C(N=NC2C=NN(C3=C(Cl)C=C(Cl)C=C3Cl)C2=O)C=1.CO.NC=O.C1=CC=CC=C1.NC(CO)=O.C1=CC=CC=C1
C1C=CC=C(N=NC2C(NC3C=CC=CC=3Cl)=NN(C3=C(Cl)C(Cl)=C(Cl)C(Cl)=C3Cl)C2=O)C=1.CCO.NC(CCCO)=O.C1C=CC=CC=1.CCO
C1=CC2C(O)=CC=C(OC3=C(CSC4N(C5C=CC=CC=5)N=NN=4)C(C)=NN3C3C=CC=CC=3)C=2C=C1.C(N)=O.C1=CC=CC=C1
C1=CC2C(O)=CC=C(OC3=CC=C(CSC4OC=CC=4)C=C3)C=2C=C1.C(N)=O.C1=CC=CC=C1.[NH+]([OH2+255])=O.CC
C1C=CC=C(N2N(CC3=CC=CC=C3)C(=O)NC2=O)C=1.CC(N)=O.CC(C=O)(C)C.C1=C(Cl)C=CC=C1.C(O)=O.CC(OCCCCCCCCCCCC)=O.CCCC
C12C=CC=CC=1C(OC1=CC=C(N=NC3C=CC4C=CC=CC=4C=3O)C=C1)=CC=C2O.C(NCCCCO)=O.C1=CC=CC=C1.[Na]OOOS.NC(C)=O.S(O[Na])(=O)=O
C1C=CC=C(N=NC2C=NN(C3=C(Cl)C=C(Cl)C=C3Cl)C2=O)C=1.CO.NC=O.C1=CC=CC=C1.NC(CO)=O.C1=CC=CC=C1
C1C=CC=C(N=NC2C(NC3C=CC=CC=3Cl)=NN(C3=C(Cl)C(Cl)=C(Cl)C(Cl)=C3Cl)C2=O)C=1.CCO.NC(CCCO)=O.C1C=CC=CC=1.CCO
C1=CC2C(O)=CC=C(OC3=C(CSC4N(C5C=CC=CC=5)N=NN=4)C(C)=NN3C3C=CC=CC=3)C=2C=C1.C(N)=O.C1=CC=CC=C1
C1=CC2C(O)=CC=C(OC3=CC=C(CSC4OC=CC=4)C=C3)C=2C=C1.C(N)=O.C1=CC=CC=C1.[NH+]([OH2+255])=O.CC
C1C=CC=C(N2N(CC3=CC=CC=C3)C(=O)NC2=O)C=1.CC(N)=O.CC(C=O)(C)C.C1=C(Cl)C=CC=C1.C(O)=O.CC(OCCCCCCCCCCCC)=O.CCCC
C12C=CC=CC=1C(OC1=CC=C(N=NC3C=CC4C=CC=CC=4C=3O)C=C1)=CC=C2O.C(NCCCCO)=O.C1=CC=CC=C1.[Na]OOOS.NC(C)=O.S(O[Na])(=O)=O
C1C=CC=C(N=NC2C=NN(C3=C(Cl)C=C(Cl)C=C3Cl)C2=O)C=1.CO.NC=O.C1=CC=CC=C1.NC(CO)=O.C1=CC=CC=C1
C1C=CC=C(N=NC2C(NC3C=CC=CC=3Cl)=NN(C3=C(Cl)C(Cl)=C(Cl)C(Cl)=C3Cl)C2=O)C=1.CCO.NC(CCCO)=O.C1C=CC=CC=1.CCO
C1=CC2C(O)=CC=C(OC3=C(CSC4N(C5C=CC=CC=5)N=NN=4)C(C)=NN3C3C=CC=CC=3)C=2C=C1.C(N)=O.C1=CC=CC=C1
C1=CC2C(O)=CC=C(OC3=CC=C(CSC4OC=CC=4)C=C3)C=2C=C1.C(N)=O.C1=CC=CC=C1.[NH+]([OH2+255])=O.CC
free(): invalid pointer
Aborted (core dumped)

@even1024
Copy link
Collaborator

I am working on the issue, and the fix will be available in Indigo 1.12.0-rc.1, which is scheduled for release in a week or two.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants