Hi,
I am a PhD student with very little experience in bioinformatics (or very little experience at all, I started two months ago).
I’m having some problems getting Snpeff to work with gff coordinates obtained by Transdecoder.
I was given by a group with which I am collaborating the assembly of a genome and a gtf file with transcript information derived from RNAseq. I used Transdecoder following the instructions, with the –single_best_orf option, and I got the cds file and a gff3.
I used the gff3 to build a database for snpeff, because I have to evaluate the effect of some SNPs on the genome. Howevere, when I launched Snpeff eff, I received a great number of warnings:
INFO_REALIGN_3_PRIME 1
WARNING_TRANSCRIPT_NO_START_CODON 202855
WARNING_TRANSCRIPT_NO_START_CODON&INFO_REALIGN_3_PRIME 2
WARNING_TRANSCRIPT_NO_STOP_CODON 17281
Protein coding transcripts : 2426
# Length errors : 0 ( 0,00% )
# STOP codons in CDS errors : 0 ( 0,00% )
# START codon errors : 686 ( 28,28% )
# STOP codon warnings : 183 ( 7,54% )
# UTR sequences : 2409 ( 99,30% )
# Total Errors : 686 ( 28,28% )
Given the low number of transcripts, this amount of warnings seems to be extremely high. Is it normal?
Also, I checked the CDSs obtained by Transdecoder and, even if not all of them start with ATG, all of them have a start codon near the beginning of the sequence, so I really cannot explain this number of warnings.
Do you have any suggestions?
May the life of he/she who comes to my aid be filled with cakes and pizzas.
Best Regards
Edoardo
I am a PhD student with very little experience in bioinformatics (or very little experience at all, I started two months ago).
I’m having some problems getting Snpeff to work with gff coordinates obtained by Transdecoder.
I was given by a group with which I am collaborating the assembly of a genome and a gtf file with transcript information derived from RNAseq. I used Transdecoder following the instructions, with the –single_best_orf option, and I got the cds file and a gff3.
I used the gff3 to build a database for snpeff, because I have to evaluate the effect of some SNPs on the genome. Howevere, when I launched Snpeff eff, I received a great number of warnings:
INFO_REALIGN_3_PRIME 1
WARNING_TRANSCRIPT_NO_START_CODON 202855
WARNING_TRANSCRIPT_NO_START_CODON&INFO_REALIGN_3_PRIME 2
WARNING_TRANSCRIPT_NO_STOP_CODON 17281
Protein coding transcripts : 2426
# Length errors : 0 ( 0,00% )
# STOP codons in CDS errors : 0 ( 0,00% )
# START codon errors : 686 ( 28,28% )
# STOP codon warnings : 183 ( 7,54% )
# UTR sequences : 2409 ( 99,30% )
# Total Errors : 686 ( 28,28% )
Given the low number of transcripts, this amount of warnings seems to be extremely high. Is it normal?
Also, I checked the CDSs obtained by Transdecoder and, even if not all of them start with ATG, all of them have a start codon near the beginning of the sequence, so I really cannot explain this number of warnings.
Do you have any suggestions?
May the life of he/she who comes to my aid be filled with cakes and pizzas.
Best Regards
Edoardo