SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Perl or Python ETHANol Bioinformatics 19 08-29-2019 04:30 AM
dexseq_prepare_annotation.py being really picky RockChalkJayhawk Bioinformatics 3 05-07-2012 08:38 AM
python dror Bioinformatics 0 11-29-2010 03:22 AM
python in ABySS dror RNA Sequencing 0 11-28-2010 05:19 AM
Bowtie, Tophat and Python natpokah Bioinformatics 2 10-26-2010 09:35 AM

Reply
 
Thread Tools
Old 04-20-2012, 06:49 AM   #1
greener
Member
 
Location: Seattle, WA

Join Date: Sep 2010
Posts: 17
Default errors while using python dexseq_prepare_annotation.py

Hi there, I am trying to use the dexseq_prepare_annotation.py tool in DEXseq and have tried several different gtf files but they all seem to get the same error. Not sure what could be causing this (below). Python version maybe? Any suggestions folks have are appreciated. Thanks -Rich

python dexseq_prepare_annotation.py test.gtf

Traceback (most recent call last):
File "dexseq_prepare_annotation.py", line 25, in ?
for f in HTSeq.GFF_Reader( gtf_file ):
File "/usr/lib64/python2.4/site-packages/HTSeq-0.5.3p3-py2.4-linux-x86_64.egg/HTSeq/__init__.py", line 204, in __iter__
for line in FileOrSequence.__iter__( self ):
File "/usr/lib64/python2.4/site-packages/HTSeq-0.5.3p3-py2.4-linux-x86_64.egg/HTSeq/__init__.py", line 42, in __iter__
if self.fos.lower().endswith( ( ".gz" , ".gzip" ) ):
TypeError: expected a character buffer object
greener is offline   Reply With Quote
Old 04-23-2012, 12:17 AM   #2
areyes
Senior Member
 
Location: Heidelberg

Join Date: Aug 2010
Posts: 165
Default

Hi greener,

I am not sure, but I think the problem is that you did not specify an output file:

Code:
Usage: python dexseq_prepare_annotation.py <in.gtf> <out.gff>
Alejandro
areyes is offline   Reply With Quote
Old 04-23-2012, 08:35 AM   #3
greener
Member
 
Location: Seattle, WA

Join Date: Sep 2010
Posts: 17
Default

Thanks areyes, Yes I did try it with an out file. still get the same error
greener is offline   Reply With Quote
Old 04-23-2012, 12:57 PM   #4
areyes
Senior Member
 
Location: Heidelberg

Join Date: Aug 2010
Posts: 165
Default

hmm strange. could you show the first lines of your gtf file?
areyes is offline   Reply With Quote
Old 04-24-2012, 12:30 PM   #5
greener
Member
 
Location: Seattle, WA

Join Date: Sep 2010
Posts: 17
Default

here is one

[greener@kojak bowtie]$ head -n 10 /vol01/genome/mouse/ucsc/annotation/mm9_ucsc_refGene.txt.gtf2.2
chr1 ucsc exon 134212701 134213049 . + 0 gene_id "NM_028778:134212701"; transcript_id "NM_028778:134212701";
chr1 ucsc exon 134221529 134221650 . + 0 gene_id "NM_028778:134212701"; transcript_id "NM_028778:134212701";
chr1 ucsc exon 134224273 134224425 . + 1 gene_id "NM_028778:134212701"; transcript_id "NM_028778:134212701";
chr1 ucsc exon 134224707 134224773 . + 0 gene_id "NM_028778:134212701"; transcript_id "NM_028778:134212701";
chr1 ucsc exon 134226534 134226654 . + 0 gene_id "NM_028778:134212701"; transcript_id "NM_028778:134212701";
chr1 ucsc exon 134227135 134227268 . + 0 gene_id "NM_028778:134212701"; transcript_id "NM_028778:134212701";
chr1 ucsc exon 134227897 134230065 . + 1 gene_id "NM_028778:134212701"; transcript_id "NM_028778:134212701";
chr1 ucsc exon 134212701 134213049 . + 0 gene_id "NM_001195025:134212701"; transcript_id "NM_001195025:134212701";
chr1 ucsc exon 134221529 134221650 . + 0 gene_id "NM_001195025:134212701"; transcript_id "NM_001195025:134212701";
chr1 ucsc exon 134222782 134222806 . + 1 gene_id "NM_001195025:134212701"; transcript_id "NM_001195025:134212701";
greener is offline   Reply With Quote
Old 06-27-2012, 01:59 AM   #6
fadista
Member
 
Location: Malmö

Join Date: Sep 2008
Posts: 37
Default

I got the same error message here. Did you find any solutions?

Thanks.
fadista is offline   Reply With Quote
Old 06-27-2012, 02:13 AM   #7
areyes
Senior Member
 
Location: Heidelberg

Join Date: Aug 2010
Posts: 165
Default

I just noticed we never answer this message, an apology for that.

Could you try to update python (to at least 2.5 ) and HTSeq to the most recent?
I was unable to reproduce the error, I think this might solve it. Let me know if not.

Alejandro
areyes is offline   Reply With Quote
Old 08-06-2012, 07:27 AM   #8
senkewiczs
Junior Member
 
Location: Poland

Join Date: Jul 2012
Posts: 5
Default

Hi areyes,

I'm also getting an error message when trying to use the dexseq_prepare_annotation.py. I'm trying to use it on the gtf file for drosophila from http://useast.ensembl.org/info/data/ftp/index.html.

$python dexseq_prepare_annotation.py Drosophila_melanogaster.BDGP5.67.gtf Drosophila_melanogaster.BDGP5.67.gff

The error message I receive is:
Traceback (most recent call last):
File "dexseq_prepare_annotation.py", line 89, in <module>
assert l[i].iv.end <= l[i+1].iv.start, str(l[i+1]) + " starts too early"
AssertionError: <GenomicFeature: exonic_part 'FBgn0261841+FBgn0261840+FBgn0261837+FBgn0261843+FBgn0261845+FBgn0261844+FBgn0261838+FBgn0261839+FBgn0002781+FBgn0261842' at 3R: 17178958 -> 17178091 (strand '-')> starts too early

We've used dexseq_prepare_annotation.py on gtf files from other species and it has always worked great. Seems strange since the pasilla package in R uses drosophila as the example dataset.

Here is the head of the drosophila gtf file:

3R protein_coding exon 380 509 . + . gene_id "FBgn0037213"; transcript_id "FBtr0078961"; exon_number "1"; gene_name "CG12581"; gene_biotype "protein_coding"; transcript_name "CG12581-RB";
3R protein_coding exon 578 1913 . + . gene_id "FBgn0037213"; transcript_id "FBtr0078961"; exon_number "2"; gene_name "CG12581"; gene_biotype "protein_coding"; transcript_name "CG12581-RB";
3R protein_coding CDS 1115 1913 . + 0 gene_id "FBgn0037213"; transcript_id "FBtr0078961"; exon_number "2"; gene_name "CG12581"; gene_biotype "protein_coding"; transcript_name "CG12581-RB"; protein_id "FBpp0078601";
3R protein_coding start_codon 1115 1117 . + 0 gene_id "FBgn0037213"; transcript_id "FBtr0078961"; exon_number "2"; gene_name "CG12581"; gene_biotype "protein_coding"; transcript_name "CG12581-RB";
3R protein_coding exon 7784 8649 . + . gene_id "FBgn0037213"; transcript_id "FBtr0078961"; exon_number "3"; gene_name "CG12581"; gene_biotype "protein_coding"; transcript_name "CG12581-RB";
3R protein_coding CDS 7784 8649 . + 2 gene_id "FBgn0037213"; transcript_id "FBtr0078961"; exon_number "3"; gene_name "CG12581"; gene_biotype "protein_coding"; transcript_name "CG12581-RB"; protein_id "FBpp0078601";
3R protein_coding exon 9439 10200 . + . gene_id "FBgn0037213"; transcript_id "FBtr0078961"; exon_number "4"; gene_name "CG12581"; gene_biotype "protein_coding"; transcript_name "CG12581-RB";
3R protein_coding CDS 9439 9768 . + 0 gene_id "FBgn0037213"; transcript_id "FBtr0078961"; exon_number "4"; gene_name "CG12581"; gene_biotype "protein_coding"; transcript_name "CG12581-RB"; protein_id "FBpp0078601";
3R protein_coding stop_codon 9769 9771 . + 0 gene_id "FBgn0037213"; transcript_id "FBtr0078961"; exon_number "4"; gene_name "CG12581"; gene_biotype "protein_coding"; transcript_name "CG12581-RB";
3R protein_coding exon 380 1913 . + . gene_id "FBgn0037213"; transcript_id "FBtr0078962"; exon_number "1"; gene_name "CG12581"; gene_biotype "protein_coding"; transcript_name "CG12581-RA";


Any thoughts? Thanks in advance
senkewiczs is offline   Reply With Quote
Old 08-07-2012, 05:57 AM   #9
areyes
Senior Member
 
Location: Heidelberg

Join Date: Aug 2010
Posts: 165
Default

I give a look into that annotation file, the annotation of the gene mod(mdg4) seems to have some problems, I just removed it:

Code:
grep -v "mod(mdg4)" Drosophila_melanogaster.BDGP5.67.gtf > Drosophila_melanogaster.BDGP5.67.filtered.gtf
And the DEXSeq error goes away!
areyes is offline   Reply With Quote
Old 08-07-2012, 07:02 AM   #10
senkewiczs
Junior Member
 
Location: Poland

Join Date: Jul 2012
Posts: 5
Default

Thanks areyes! Helpful as always!
senkewiczs is offline   Reply With Quote
Old 02-22-2013, 11:38 AM   #11
Ajayi Oyeyemi
Member
 
Location: Abeokuta

Join Date: Jul 2012
Posts: 30
Default

Hi everyone,
I've been trying to use the dexseq_prepare_annotation.py script but I kept on getting error. Please see the code below.

imumorin@ansci253135199:~/myrnaseqexp$ python dexseq_prepare_annotation.py Drosophila_melanogaster.BDGP5.70.gtf Drosophila_melanogaster.BDGP5.70.gff
File "dexseq_prepare_annotation.py", line 4
<!DOCTYPE html>
^
SyntaxError: invalid syntax

I got the script from this page https://github.com/olgabot/rna-seq-d..._annotation.py and I used the wget command. Can someone please help me?
Ajayi Oyeyemi is offline   Reply With Quote
Old 02-22-2013, 11:50 AM   #12
Ajayi Oyeyemi
Member
 
Location: Abeokuta

Join Date: Jul 2012
Posts: 30
Default

I was suspecting that it was incorrectly installed since I searched the package but couldn't get the script out. See a part of my ls -al for the script:

-rw-rw-r-- 1 imumorin imumorin 55209 2013-02-22 15:24 dexseq_prepare_annotation.py

Any clue?
Ajayi Oyeyemi is offline   Reply With Quote
Old 02-23-2013, 12:27 AM   #13
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Well, a Python file is not supposed to contain a Doctype tag. You downloaded the HTML source of the page displaying the code of the Python script, not the script itself.

Why did you download it separately from the DEXSeq package at all?

Use the R command system.file( package="DEXSeq" ) to see which directory R has installed DEXSeq in. There you will find a sub-directory python-scripts containing the correct file.
Simon Anders is offline   Reply With Quote
Old 02-25-2013, 01:28 PM   #14
Ajayi Oyeyemi
Member
 
Location: Abeokuta

Join Date: Jul 2012
Posts: 30
Default

Quote:
Originally Posted by Simon Anders View Post
Well, a Python file is not supposed to contain a Doctype tag. You downloaded the HTML source of the page displaying the code of the Python script, not the script itself.

Why did you download it separately from the DEXSeq package at all?

Use the R command system.file( package="DEXSeq" ) to see which directory R has installed DEXSeq in. There you will find a sub-directory python-scripts containing the correct file.
Thanks Simon. I followed your que and it worked for version 2.15. I'm using ubuntu linux and I tried to upgrade it to version 2.15.2 but it appears I couldn't figure it out. Is there anyone that knows how I can upgrade the R version 2.13 in ubuntu linux to 2.15.2?
Any help will be greatly appreciated.

Thanks Simon once again...
Ajayi Oyeyemi is offline   Reply With Quote
Old 02-26-2013, 12:01 AM   #15
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

The R version included in Canonical's official Ubuntu package repository is always a bit old. If you add the package repository from CRAN to your package sources, you can always get the newest version.

See here for details: http://cran.r-project.org/bin/linux/ubuntu/README
Simon Anders is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:40 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO