View Single Post
Old 08-06-2012, 07:27 AM   #8
senkewiczs
Junior Member
 
Location: Poland

Join Date: Jul 2012
Posts: 5
Default

Hi areyes,

I'm also getting an error message when trying to use the dexseq_prepare_annotation.py. I'm trying to use it on the gtf file for drosophila from http://useast.ensembl.org/info/data/ftp/index.html.

$python dexseq_prepare_annotation.py Drosophila_melanogaster.BDGP5.67.gtf Drosophila_melanogaster.BDGP5.67.gff

The error message I receive is:
Traceback (most recent call last):
File "dexseq_prepare_annotation.py", line 89, in <module>
assert l[i].iv.end <= l[i+1].iv.start, str(l[i+1]) + " starts too early"
AssertionError: <GenomicFeature: exonic_part 'FBgn0261841+FBgn0261840+FBgn0261837+FBgn0261843+FBgn0261845+FBgn0261844+FBgn0261838+FBgn0261839+FBgn0002781+FBgn0261842' at 3R: 17178958 -> 17178091 (strand '-')> starts too early

We've used dexseq_prepare_annotation.py on gtf files from other species and it has always worked great. Seems strange since the pasilla package in R uses drosophila as the example dataset.

Here is the head of the drosophila gtf file:

3R protein_coding exon 380 509 . + . gene_id "FBgn0037213"; transcript_id "FBtr0078961"; exon_number "1"; gene_name "CG12581"; gene_biotype "protein_coding"; transcript_name "CG12581-RB";
3R protein_coding exon 578 1913 . + . gene_id "FBgn0037213"; transcript_id "FBtr0078961"; exon_number "2"; gene_name "CG12581"; gene_biotype "protein_coding"; transcript_name "CG12581-RB";
3R protein_coding CDS 1115 1913 . + 0 gene_id "FBgn0037213"; transcript_id "FBtr0078961"; exon_number "2"; gene_name "CG12581"; gene_biotype "protein_coding"; transcript_name "CG12581-RB"; protein_id "FBpp0078601";
3R protein_coding start_codon 1115 1117 . + 0 gene_id "FBgn0037213"; transcript_id "FBtr0078961"; exon_number "2"; gene_name "CG12581"; gene_biotype "protein_coding"; transcript_name "CG12581-RB";
3R protein_coding exon 7784 8649 . + . gene_id "FBgn0037213"; transcript_id "FBtr0078961"; exon_number "3"; gene_name "CG12581"; gene_biotype "protein_coding"; transcript_name "CG12581-RB";
3R protein_coding CDS 7784 8649 . + 2 gene_id "FBgn0037213"; transcript_id "FBtr0078961"; exon_number "3"; gene_name "CG12581"; gene_biotype "protein_coding"; transcript_name "CG12581-RB"; protein_id "FBpp0078601";
3R protein_coding exon 9439 10200 . + . gene_id "FBgn0037213"; transcript_id "FBtr0078961"; exon_number "4"; gene_name "CG12581"; gene_biotype "protein_coding"; transcript_name "CG12581-RB";
3R protein_coding CDS 9439 9768 . + 0 gene_id "FBgn0037213"; transcript_id "FBtr0078961"; exon_number "4"; gene_name "CG12581"; gene_biotype "protein_coding"; transcript_name "CG12581-RB"; protein_id "FBpp0078601";
3R protein_coding stop_codon 9769 9771 . + 0 gene_id "FBgn0037213"; transcript_id "FBtr0078961"; exon_number "4"; gene_name "CG12581"; gene_biotype "protein_coding"; transcript_name "CG12581-RB";
3R protein_coding exon 380 1913 . + . gene_id "FBgn0037213"; transcript_id "FBtr0078962"; exon_number "1"; gene_name "CG12581"; gene_biotype "protein_coding"; transcript_name "CG12581-RA";


Any thoughts? Thanks in advance
senkewiczs is offline   Reply With Quote