SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
samtools sorting issue or HTSeq-count problem? bbl Bioinformatics 4 10-28-2014 07:57 AM
What is wrong with merely 28% tophat2 mapped reads are counted by HTSeq-count bbl Bioinformatics 3 06-19-2014 10:00 AM
Tophat2 Bowtie2 Htseq-count for bacteria chickenmcfu Bioinformatics 2 10-16-2013 05:31 AM
HTSeq-count issue zuco RNA Sequencing 2 08-15-2013 11:30 PM
Issue with htseq-count cpleis Bioinformatics 8 10-15-2012 09:31 AM

Reply
 
Thread Tools
Old 09-05-2017, 03:23 AM   #1
Enriquez
Junior Member
 
Location: Rennes (France)

Join Date: Sep 2017
Posts: 2
Default Issue with Htseq-count on BAM files from Tophat2 using Galaxy

Hello,

I'm currently facing troubles using galaxy. I want to compare differentially expressed genes between two treatment groups. I already map my reads on my reference genome (70% remaping) and now I'm trying to obtain the differential expression matrix using Htseq count. (For information, my data are Illumina Hiseq 2500, pair end, 125pb).

I already map my reads on my reference genome thanks to Tophat2 (70%remaping), but when I tried to run Htseq on the Bam files from Htseq send me this error message:

Fatal error: Unknown error occured Error occured when processing GFF file (line 40 of file /opt/galaxy-dist/database/files/002/052/dataset_2052791.dat): Feature DS10_00012179-RA:exon:1059 does not contain a 'gene_id' attribute [Exception type: ValueError, raised in count.py:53]

I though that maybe it could an issue due to my gff3 file, and I tried to convert it into a gtf file using the GFF to GTF converter. But I obtain the following error message:

Traceback (most recent call last): File "/opt/shed_tools/toolshed.g2.bx.psu.edu/repos/vipints/fml_gff3togtf/6e589f267c14/fml_gff3togtf/gff_to_gtf.py", line 17, in <module> import GFFParser File "/opt/shed_tools/toolshed.g2.bx.psu.edu/repos/vipints/fml_gff3togtf/6e589f267c14/fml_gff3togtf/GFFParser.py", line 20, in <module> import scipy.io as sio ImportError: No module named scipy.io

I read that it could be because my Bam files were not sorted by the gene id. So, I tried to sort my Bam files using the tool sort from the SAMtool suite, and obtain an error message again:

Tool execution generated the following error message: Error running samtools sort. mv: cannot stat `foo.bam': No such file or directory The tool produced the following additional output: [bam_sort] Use -T PREFIX / -o FILE to specify temporary and final output files Usage: samtools sort [options...] [in.bam] Options: -l INT Set compression level, from 0 (uncompressed) to 9 (best) -m INT Set maximum memory per thread; suffix K/M/G recognized [768M] -n Sort by read name -o FILE Write final output to FILE rather than standard output -T PREFIX Write temporary files to PREFIX.nnnn.bam [email protected], --threads INT Set number of sorting and compression threads [1] --input-fmt-option OPT[=VAL] Specify a single input file format option in the form of OPTION or OPTION=VALUE -O, --output-fmt FORMAT[,OPT[=VAL]]... Specify output format (SAM, BAM, CRAM) --output-fmt-option OPT[=VAL] Specify a single output file format option in the form of OPTION or OPTION=VALUE --reference FILE Reference sequence FASTA FILE [null]

I do not understand why I received as much error messages. Does anyone face up a similar issue? Or knows where this problems come from?

Thank you in advance
Enriquez is offline   Reply With Quote
Old 09-17-2017, 07:21 PM   #2
neavemj
Member
 
Location: MA, USA

Join Date: Feb 2014
Posts: 22
Default

Hi Enriquez,

You should not need to sort the bam file, so I don't think that's the problem.

Do you know if your gff file contains the 'gene_id' attribute? You can open the file in a text editor and check that this is listed. Otherwise, you can change the gene id variable using '--idattr'. This option should also be available in galaxy.

I think converting your file from gff3 to gtf is also a pretty good idea. I think I've done this in the past and it worked. The error you are getting suggests that the python library 'scipy' is not installed in your galaxy configuration. Perhaps you can get the system administrators to install it for you?

Best,

Matt.
neavemj is offline   Reply With Quote
Reply

Tags
galaxy, htseq count, rnaseq, tophat 2

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:43 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO