Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Link to problematic files #7

Open
nchenche opened this issue Sep 22, 2022 · 2 comments
Open

Link to problematic files #7

nchenche opened this issue Sep 22, 2022 · 2 comments
Labels
bug Something isn't working invalid This doesn't seem right

Comments

@nchenche
Copy link
Collaborator

Issue created to add input files orftrack wrongly handles (preferentially link to files to avoid huge files to be stored here)

@nchenche nchenche added bug Something isn't working invalid This doesn't seem right labels Sep 22, 2022
@nchenche nchenche changed the title Problematic files Link to problematic files Sep 22, 2022
@nchenche
Copy link
Collaborator Author

nchenche commented Oct 1, 2022

Hi paul,

I received your mail and the problematic files :

  1. PNL4-3XCS.fa & PNL4-3XCS.gff3
  2. Spar.fna & Spar.gff
  3. Oryza_sativa.fna & Oryza_sativa.gff

Case 1:

The error was due to an inconsistent chromosome name between the fasta (chr name: PNL4-3XCS) and the gff (chr name: chrVIH) file. If the chromosome name don't match, the process is stopped and a message is written in the ouput:

# PARSING GFF FILE
------------------

Checking chromosome IDs consistency between GFF and fasta file...

      Chromosome ids              in GFF            in fasta
------------------------------------------------------------
           PNL4-3XCS                   -                   X
              chrVIH                   X                   -

Chromosomes are not consistent between GFF and fasta files.

Case 2:

No problem identified neither with orftrack nor with orfget.

Case 3:

Orfget stops with the following error:

  File "/home/nchenche/projects/ORFmine/orftrack/orftrack/scripts/ORFget.py", line 248, in write_multifastas
    aa_seq = dico_info[gene]['AA_seq']
KeyError: 'AA_seq'

Orftrack runs until the end without any raised error. However the output looks strange at the first sight since for each chromosome only one type of ORF is found:

chr ORF type qty avg length (aa)
1 nc_intergenic 1340250 46.29
10 nc_intergenic 717431 46.16
11 nc_intergenic 900540 45.53
12 nc_intergenic 856175 45.52
2 nc_intergenic 1117788 45.81
3 nc_intergenic 1132103 46.09
4 nc_intergenic 1096840 46.68
5 nc_intergenic 927312 46.36
6 nc_intergenic 968303 46.07
7 nc_intergenic 920731 46
8 nc_intergenic 881891 45.86
9 nc_intergenic 713027 46.05
AC155918 nc_ovp_opp-scaffold 1033 47.74
AC156495 nc_ovp_opp-scaffold 2694 49.73
AC160949 nc_ovp_opp-scaffold 3942 46.05
AC174930 nc_ovp_opp-scaffold 482 43.78
AP008246 nc_ovp_opp-scaffold 6367 45.13
AP008247 nc_ovp_opp-scaffold 4775 47.98
Mt nc_intergenic 16576 42.2
Pt nc_intergenic 4310 42.87
Syng_TIGR_002 nc_ovp_opp-scaffold 473 43.33
Syng_TIGR_004 nc_ovp_opp-scaffold 631 39.41
Syng_TIGR_005 nc_ovp_opp-scaffold 634 53.61
Syng_TIGR_007 nc_ovp_opp-scaffold 181 72.8
Syng_TIGR_008 nc_ovp_opp-scaffold 504 42.51
Syng_TIGR_009 nc_ovp_opp-scaffold 297 56.61
Syng_TIGR_010 nc_ovp_opp-scaffold 482 48.22
Syng_TIGR_011 nc_ovp_opp-scaffold 337 44.83
Syng_TIGR_012 nc_ovp_opp-scaffold 502 50.45
Syng_TIGR_013 nc_ovp_opp-scaffold 312 52.78
Syng_TIGR_014 nc_ovp_opp-scaffold 675 42.52
Syng_TIGR_015 nc_ovp_opp-scaffold 310 48.15
Syng_TIGR_016 nc_ovp_opp-scaffold 428 39.91
Syng_TIGR_019 nc_ovp_opp-scaffold 333 42.69
Syng_TIGR_020 nc_ovp_opp-scaffold 323 50.13
Syng_TIGR_021 nc_ovp_opp-scaffold 537 42.74
Syng_TIGR_022 nc_ovp_opp-scaffold 294 52.86
Syng_TIGR_023 nc_ovp_opp-scaffold 789 45.39
Syng_TIGR_024 nc_ovp_opp-scaffold 298 53.4
Syng_TIGR_026 nc_ovp_opp-scaffold 640 43.09
Syng_TIGR_027 nc_ovp_opp-scaffold 354 43.81
Syng_TIGR_028 nc_ovp_opp-scaffold 978 43.5
Syng_TIGR_029 nc_ovp_opp-scaffold 375 51.91
Syng_TIGR_030 nc_ovp_opp-scaffold 330 48.55
Syng_TIGR_031 nc_ovp_opp-scaffold 218 76.23
Syng_TIGR_032 nc_ovp_opp-scaffold 297 46.67
Syng_TIGR_033 nc_ovp_opp-scaffold 334 52.03
Syng_TIGR_034 nc_ovp_opp-scaffold 253 70.8
Syng_TIGR_035 nc_ovp_opp-scaffold 360 43.19
Syng_TIGR_036 nc_ovp_opp-scaffold 316 48.67
Syng_TIGR_037 nc_ovp_opp-scaffold 422 42.04
Syng_TIGR_038 nc_ovp_opp-scaffold 258 49.55
Syng_TIGR_039 nc_ovp_opp-scaffold 207 45.76
Syng_TIGR_041 nc_ovp_opp-scaffold 299 48.11
Syng_TIGR_042 nc_ovp_opp-scaffold 131 73.92
Syng_TIGR_043 nc_ovp_opp-scaffold 93 83.63
Syng_TIGR_044 nc_ovp_opp-scaffold 192 41.58
Syng_TIGR_045 nc_ovp_opp-scaffold 670 51.9
Syng_TIGR_046 nc_ovp_opp-scaffold 366 44.58
Syng_TIGR_047 nc_ovp_opp-scaffold 651 45.27
Syng_TIGR_048 nc_ovp_opp-scaffold 199 53.76
Syng_TIGR_049 nc_ovp_opp-scaffold 166 60.4
Syng_TIGR_050 nc_ovp_opp-scaffold 280 47.08

@nchenche
Copy link
Collaborator Author

nchenche commented Oct 1, 2022

Case 3 issue from orftrack has been fixed in the recent push (see commit Fix bug when '#' is found in gff between different features in same chr).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working invalid This doesn't seem right
Projects
None yet
Development

No branches or pull requests

1 participant