Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import of modified IDT monomers #1899

Closed
olganaz opened this issue Apr 3, 2024 · 1 comment · Fixed by #1935 or #1943
Closed

Import of modified IDT monomers #1899

olganaz opened this issue Apr 3, 2024 · 1 comment · Fixed by #1935 or #1943

Comments

@olganaz
Copy link
Collaborator

olganaz commented Apr 3, 2024

Background
In IDT notation modified sequences represented as a plain strings with a combination of standard and modified monomers.
Standard monomer [s]<Base>[*] is nucleotides with the same configurations as supported in Ketcher.
Modified monomer /<pos><Identifier>/[*] could be recognized as one of the following:

  • Known nucleotide with defined submonomers (RNA preset) from Ketcher library
  • Known nucleotide with defined structure, but with undefined submonomers (unsplit nucleotide) or CHEM.
  • CHEM with unknown structure (unresolved monomer)

This task covers only import of modified IDT monomers.

Requirements

  1. The system should interpret the following /<pos><Identifier>/[*] as the IDT alias of monomer from Ketcher library and import corresponding monomer.

  2. If there is no monomer with corresponding alias in a library, then system should Import IDT monomer as monomer with IDT alias only (no structure)

  3. The system should check the position of the monomer in a chain according to pos in IDT alias:

    • 5- at the 5' end (the first monomer in a chain)
    • i- inside the chain
    • 3 - at the 3' end (the last monomer in a chain)

    In case if position indicator in IDT code contradicts real position of the monomer in the chain, this should be treated as format error and import should fail with appropriate error message:
    IDT alias <IDT id> cannot be used at five prime
    (was Position of monomer <IDT id> in sequence contradicts its code but decided to change - approved by @olganaz)

  4. When * is implied to modified IDT monomer, system should check also whether RNA preset with IDT alias /<pos><Identifier>/ exists in a library
    - if there is an RNA preset with IDT alias /<pos><Identifier>/ then /<pos><Identifier>/* should be imported as RNA preset, in which phosphate (P) is changed to Phosphorothioate (sP)
    Import IDT_modified phosphate

  5. The bonds between monomers should be established from R2 attachment point of the first monomer to R1 attachment point of the second monomer.

Examples

  • /52MOErA/*/i2MOErC/*/32MOErT/
  • /5Phos/ACG/3Phos/
@AliaksandrDziarkach AliaksandrDziarkach self-assigned this Apr 24, 2024
AliaksandrDziarkach added a commit that referenced this issue Apr 29, 2024
Add monomer library support.
Add IDT modification support.
Add UT
@AliaksandrDziarkach AliaksandrDziarkach linked a pull request May 2, 2024 that will close this issue
7 tasks
AlexanderSavelyev pushed a commit that referenced this issue May 2, 2024
Co-authored-by: Aliakasndr Dziarkach <Aliakasndr.Dziarkach@gmail.com>
AliaksandrDziarkach added a commit that referenced this issue May 6, 2024
@AliaksandrDziarkach AliaksandrDziarkach linked a pull request May 6, 2024 that will close this issue
7 tasks
@AlexeyGirin
Copy link
Collaborator

Verified.

  • Indigo Toolkit Version 1.21.0-rc.1.0-g9194599b2-wasm32-wasm-clang-19.0.0
  • Ketcher Version 2.22.0-rc.2 Build at 2024-06-03; 18:43:00
  • Chrome Version 125.0.6422.142 (Official Build) (64-bit)
  • Windows 10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
3 participants