Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cumbersome features #1

Open
Proginski opened this issue Dec 6, 2021 · 1 comment
Open

Cumbersome features #1

Proginski opened this issue Dec 6, 2021 · 1 comment

Comments

@Proginski
Copy link
Collaborator

Proginski commented Dec 6, 2021

In some gff files are features that cover most of the track.
For example : GCF_000247795.1
In the related gff file (enclosed), there is a feature named "match" that fully overlaps with the first chromosome
NC_032650.1 RefSeq region 1 161108492 . + . ID=NC_032650.1:1..161108492;Dbxref=taxon:9915;Name=1;breed=Nelore;chromosome=1;country=Brazil;gb-synonym=Bos taurus indicus;gbkey=Src;genome=chromosome;isolate=QUIL7308;mol_type=genomic DNA;note=animal owned by Agropecuaria Quilombo Inc.;sex=male;tissue-type=peripheral blood mononuclear cells
line num 37235:
NC_032650.1 RefSeq match 1 161108492 . + . ID=aln0;Target=NC_032650.1 1 161108492 +;gap_count=0;num_mismatch=0;pct_coverage=100;pct_identity_gap=100

In consequence orfget is not able to define any pure intergenic ORF :

NC_032650.1

ORF type Quantity Average length (aa)


c_CDS 7649 100.45
nc_ovp_opp-CDS 19987 58.68
nc_ovp_opp-cDNA_match 201 39.65
nc_ovp_opp-match 1983772 46.8
nc_ovp_same-CDS 11740 52.03
nc_ovp_same-cDNA_match 713 39.64
nc_ovp_same-lnc_RNA 15831 42.05
nc_ovp_same-mRNA 439133 44.33
nc_ovp_same-match 2449854 46.35
nc_ovp_same-pseudogene 10750 48.33
nc_ovp_same-tRNA 16 68.0
nc_ovp_same-transcript 281 65.47

Would it be possible as a preliminary step in orftrack, to exclude features whose region coverage exceeds lets say 90% to avoid this behavior ?

Meanwhile, since the 6 only genomes with this error I identified so far, all contain a 'match' feature, I suggest to simply add 'match' to line 597 of gff_parser.py
if element_type not in ['chromosome', 'region','match']:

@nchenche
Copy link
Collaborator

nchenche commented May 4, 2022

Hi Paul,

This is an old and resolved issue now but yes you were right.

Thanks !

Fadwa7 pushed a commit that referenced this issue Mar 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants