500K Gene Models with Many Short Sequences: Valid AGAT Output or Command Error? #495

Vijithkumar2020 · 2024-09-26T08:55:44Z

This is regarding a de novo genome of a plant that was assembled lately. I used AGAT's feature extraction tool, to get the gene models predicted by AUGUSTUS. The repeat-masked genome is of size 2.6gb, and the fasta file resulted from AGAT's feature extraction file was ~600Mb, comprising 500K gene models. The following command was used for AGAT's feature extraction. I just like to know if this is the right command that was supposed to be used as my output file contains way too many short sequences.

agat_sp_extract_sequences.pl \
--gff /output_file.gff \
--fasta /media/masked.fasta \
--output /out.fasta \
-t gene --split

The text was updated successfully, but these errors were encountered:

Juke34 · 2024-09-26T12:09:14Z

Have you checked the help? https://nbisweden.github.io/AGAT/tools/agat_sp_extract_sequences/#briefly-in-pictures
I guess the --split is useless.
Then if you want to extract everything from the start of the gene to the end of (So it contains UTR+exon+intron) -t gene is correct.
If you want to check what is in your file before to use agat_sp_extract_sequences.pl to be sure you had 500K gene as input in the GFF use agat_sq_stat_basic.pl prior your analyse.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

500K Gene Models with Many Short Sequences: Valid AGAT Output or Command Error? #495

500K Gene Models with Many Short Sequences: Valid AGAT Output or Command Error? #495

Vijithkumar2020 commented Sep 26, 2024

Juke34 commented Sep 26, 2024

500K Gene Models with Many Short Sequences: Valid AGAT Output or Command Error? #495

500K Gene Models with Many Short Sequences: Valid AGAT Output or Command Error? #495

Comments

Vijithkumar2020 commented Sep 26, 2024

Juke34 commented Sep 26, 2024