SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
samtools sorting issue or HTSeq-count problem? bbl Bioinformatics 4 10-28-2014 08:57 AM
What is wrong with merely 28% tophat2 mapped reads are counted by HTSeq-count bbl Bioinformatics 3 06-19-2014 11:00 AM
Tophat2 Bowtie2 Htseq-count for bacteria chickenmcfu Bioinformatics 2 10-16-2013 06:31 AM
HTSeq-count issue zuco RNA Sequencing 2 08-16-2013 12:30 AM
Issue with htseq-count cpleis Bioinformatics 8 10-15-2012 10:31 AM

Reply
 
Thread Tools
Old 09-05-2017, 04:23 AM   #1
Enriquez
Junior Member
 
Location: Rennes (France)

Join Date: Sep 2017
Posts: 3
Default Issue with Htseq-count on BAM files from Tophat2 using Galaxy

Hello,

I'm currently facing troubles using galaxy. I want to compare differentially expressed genes between two treatment groups. I already map my reads on my reference genome (70% remaping) and now I'm trying to obtain the differential expression matrix using Htseq count. (For information, my data are Illumina Hiseq 2500, pair end, 125pb).

I already map my reads on my reference genome thanks to Tophat2 (70%remaping), but when I tried to run Htseq on the Bam files from Htseq send me this error message:

Fatal error: Unknown error occured Error occured when processing GFF file (line 40 of file /opt/galaxy-dist/database/files/002/052/dataset_2052791.dat): Feature DS10_00012179-RA:exon:1059 does not contain a 'gene_id' attribute [Exception type: ValueError, raised in count.py:53]

I though that maybe it could an issue due to my gff3 file, and I tried to convert it into a gtf file using the GFF to GTF converter. But I obtain the following error message:

Traceback (most recent call last): File "/opt/shed_tools/toolshed.g2.bx.psu.edu/repos/vipints/fml_gff3togtf/6e589f267c14/fml_gff3togtf/gff_to_gtf.py", line 17, in <module> import GFFParser File "/opt/shed_tools/toolshed.g2.bx.psu.edu/repos/vipints/fml_gff3togtf/6e589f267c14/fml_gff3togtf/GFFParser.py", line 20, in <module> import scipy.io as sio ImportError: No module named scipy.io

I read that it could be because my Bam files were not sorted by the gene id. So, I tried to sort my Bam files using the tool sort from the SAMtool suite, and obtain an error message again:

Tool execution generated the following error message: Error running samtools sort. mv: cannot stat `foo.bam': No such file or directory The tool produced the following additional output: [bam_sort] Use -T PREFIX / -o FILE to specify temporary and final output files Usage: samtools sort [options...] [in.bam] Options: -l INT Set compression level, from 0 (uncompressed) to 9 (best) -m INT Set maximum memory per thread; suffix K/M/G recognized [768M] -n Sort by read name -o FILE Write final output to FILE rather than standard output -T PREFIX Write temporary files to PREFIX.nnnn.bam [email protected], --threads INT Set number of sorting and compression threads [1] --input-fmt-option OPT[=VAL] Specify a single input file format option in the form of OPTION or OPTION=VALUE -O, --output-fmt FORMAT[,OPT[=VAL]]... Specify output format (SAM, BAM, CRAM) --output-fmt-option OPT[=VAL] Specify a single output file format option in the form of OPTION or OPTION=VALUE --reference FILE Reference sequence FASTA FILE [null]

I do not understand why I received as much error messages. Does anyone face up a similar issue? Or knows where this problems come from?

Thank you in advance
Enriquez is offline   Reply With Quote
Old 09-17-2017, 08:21 PM   #2
neavemj
Member
 
Location: MA, USA

Join Date: Feb 2014
Posts: 40
Default

Hi Enriquez,

You should not need to sort the bam file, so I don't think that's the problem.

Do you know if your gff file contains the 'gene_id' attribute? You can open the file in a text editor and check that this is listed. Otherwise, you can change the gene id variable using '--idattr'. This option should also be available in galaxy.

I think converting your file from gff3 to gtf is also a pretty good idea. I think I've done this in the past and it worked. The error you are getting suggests that the python library 'scipy' is not installed in your galaxy configuration. Perhaps you can get the system administrators to install it for you?

Best,

Matt.
neavemj is offline   Reply With Quote
Old 10-04-2017, 04:36 AM   #3
Enriquez
Junior Member
 
Location: Rennes (France)

Join Date: Sep 2017
Posts: 3
Default

Hello ,
Sorry for this long time withtout answer.
So i set up Htseq count with the term "transcript_id" instead of gene_id and I convert my gff3 into gtf using gffread. Htseq has finished his job without any error report, but the output matrix contains only 0 :
Geneid TopHat on data 69 data 16 and data 15: accepted_hits
DS10_00000001 0
DS10_00000002 0
DS10_00000003 0
DS10_00000004 0
DS10_00000005 0
DS10_00000006 0
DS10_00000007 0
DS10_00000008 0
DS10_00000009 0

Did you face up a similar problem?
Thanks a lot for your help,
Thomas
Enriquez is offline   Reply With Quote
Old 10-04-2017, 02:49 PM   #4
neavemj
Member
 
Location: MA, USA

Join Date: Feb 2014
Posts: 40
Default

Hi Thomas, nice work figuring out those other problems!

I'm not sure why HTSeq is outputting only 0s. You might just have to look very carefully at all of the input files. Make sure that gtf is in the right format, and nothing weird has happened in the conversion. Also make sure that the names in your reference genome (which you used to make the bam files) match the names in your new gff.

Also, by default in HTseq, multi-mapping reads will not be counted. Perhaps by using 'transcript_id' this produces what appears to be multi-mapping reads across the same gene?

These are my guesses anyway!

Good luck,

Matt.
neavemj is offline   Reply With Quote
Reply

Tags
galaxy, htseq count, rnaseq, tophat 2

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:30 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO