SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
Tophat2 Bowtie2 Htseq-count for bacteria chickenmcfu Bioinformatics 2 10-16-2013 05:31 AM
Tophat2 can't find Bowtie2 rnavon Bioinformatics 5 02-27-2013 06:51 AM
Tophat2 very slow when running over Bowtie2 jdenvir Bioinformatics 2 02-18-2013 05:28 AM
bowtie2 vs. Tophat2 RNA-Seq tschauer Bioinformatics 2 12-18-2012 02:45 AM
tophat2/bowtie2 inconsistency in number of unmapped reads manianslab Bioinformatics 2 07-13-2012 12:56 PM

Reply
 
Thread Tools
Old 10-23-2014, 05:04 PM   #1
Studentlost
Member
 
Location: Sacramento

Join Date: Oct 2014
Posts: 28
Default Getting much higher coverage with bowtie2 than tophat2

Hello,


I ran an analysis on paired end reads through tophat2 using:
tophat2 -p12 -o <tophat_dir> --no-coverage-search <reference genome> R1.fq R2.fq
and the results gave 1.2% coverage.

I ran the same data through bowtie2
bowtie2 -x <index> -1 <R1.fq> -2 <R2.fq> -S <output.sam> and got a 42.75% overall alignment rate.

Why such a big discrepancy? I tried --coverage-search as well and got the same results.

I checked the tophat run.log and it's putting this into bowtie:
bowtie2 -k 20 -D 15 -R 2 -N 0 -L 20 -i S,1,1.25 --gbar 4 --mp 6,2 --np 1 --rdg 5,3 --rfg 5,3 --score-min C,-14,0 -p 12 --sam-no-hd -x

Any idea what's going on?

P.S. I've been getting a silent error in my tophat.log
bam2fastx: /usr/lib64/libz.so.1: no version information available
and
fix_map_ordering: /usr/lib64/libz.so.1: no version information available
I do have libz.so.1.2.3

The process still runs fine and the bam file output can be used for differential analysis... I just have terrible coverage. Any ideas?
Studentlost is offline   Reply With Quote
Old 10-23-2014, 05:28 PM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,076
Default

libz.so.1 is a warning so your observation is unrelated (see #7 in http://seqanswers.com/forums/showthread.php?t=39873).
GenoMax is online now   Reply With Quote
Old 10-24-2014, 05:05 PM   #3
Studentlost
Member
 
Location: Sacramento

Join Date: Oct 2014
Posts: 28
Default

I'm sorry but to which #7 are you referring to? I didn't see any relevance in that thread?

I'm just curious why tophat is missing 40% of the alignment that bowtie2 is finding.
Studentlost is offline   Reply With Quote
Old 10-24-2014, 05:35 PM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,076
Default

Since you had noted the silent errors in your post I was only pointing out that they are warnings and are not related to the difference you are seeing.

Have you tried to run bowtie2 with the same parameters as TopHat? Since bowtie2 in tophat is being run with different parameters it is not surprising that the result there is different.

You can try BBMap as an alternative to tophat.
GenoMax is online now   Reply With Quote
Old 10-24-2014, 05:41 PM   #5
Studentlost
Member
 
Location: Sacramento

Join Date: Oct 2014
Posts: 28
Default

The interesting thing is that I took a look at bowtie 2's defaults and tophat was pretty much on point with running them.
Studentlost is offline   Reply With Quote
Old 10-24-2014, 05:44 PM   #6
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,076
Default

But the command options you included for bowtie2 example that you ran directly are not the same as what TopHat used.

The parameters used in TopHat are the defaults for bowtie2 is what you were saying. My apologies.

Last edited by GenoMax; 10-24-2014 at 05:58 PM.
GenoMax is online now   Reply With Quote
Old 10-24-2014, 06:04 PM   #7
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,076
Default

Have you done QC with these reads? Have they been trimmed in parallel (R1 and R2)?
GenoMax is online now   Reply With Quote
Old 10-25-2014, 12:02 AM   #8
Studentlost
Member
 
Location: Sacramento

Join Date: Oct 2014
Posts: 28
Default

Yeah I did QC before and after checking for adapter contamination and trimming using scythe and sickle, respectively.

This is really puzzling me. There's no reason for tophat to give me different results than bowtie2 would...
Studentlost is offline   Reply With Quote
Old 10-27-2014, 11:24 PM   #9
colindaven
Senior Member
 
Location: Germany

Join Date: Oct 2008
Posts: 415
Default

Tophat can be quite sensitive to a few parameters especially insert size in my experience. That wouldn't account for the discrepancy here however.

I would try another aligner - and can highly recommend STAR for speed and accuracy.
colindaven is offline   Reply With Quote
Old 10-28-2014, 10:11 AM   #10
Studentlost
Member
 
Location: Sacramento

Join Date: Oct 2014
Posts: 28
Default

There's no reason Tophat should be failing like this though. Any idea what parameters I can try to change in Tophat to fix the issue?
Studentlost is offline   Reply With Quote
Old 10-28-2014, 02:21 PM   #11
sdriscoll
I like code
 
Location: San Diego, CA, USA

Join Date: Sep 2009
Posts: 438
Default

To cut down on the possible complexity of what is going wrong, try aligning only one of the two mate files with Tophat (i.e. as a single-end alignment) and see if Tophat manages to align more data.

Also..how long are your reads and what are you aligning to?
__________________
/* Shawn Driscoll, Gene Expression Laboratory, Pfaff
Salk Institute for Biological Studies, La Jolla, CA, USA */
sdriscoll is offline   Reply With Quote
Old 10-28-2014, 04:25 PM   #12
Studentlost
Member
 
Location: Sacramento

Join Date: Oct 2014
Posts: 28
Default

My reads average around 1,000,000 base pairs. I aligned them to the Mmul_1 Rhesus build from Ensembl. I also tried the resMac3 build to compare. I tried with and without a transcriptome index and with and without a reference GTF file. Nothing made a difference. This is blowing my mind.

I can align the paired ends with bowtie2 and I get ~40-60% per sample, but with tophat I get between 0.5% - 4 % per sample.

I was told by another lab working on this that they were able to get the alignment I got with bowtie2 using gsnap. It doesn't make any sense to me why Tophat is the only tool doing this. That means it's unreliable for other alignments in my mind and that bothers me a lot.
Studentlost is offline   Reply With Quote
Old 10-28-2014, 04:36 PM   #13
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,076
Default

You are not using original reads? You have reads/contigs that average a megabase each?

TopHat is designed for reads that are a kb or shorter.
GenoMax is online now   Reply With Quote
Old 10-28-2014, 04:48 PM   #14
Studentlost
Member
 
Location: Sacramento

Join Date: Oct 2014
Posts: 28
Default

Wait I'm sorry, I meant my total reads for each strand are at 1 MB. Each read per file is 251.
Studentlost is offline   Reply With Quote
Old 10-28-2014, 04:49 PM   #15
sdriscoll
I like code
 
Location: San Diego, CA, USA

Join Date: Sep 2009
Posts: 438
Default

Yeah, did I read that right? 1,000,000 base paired-end reads? No wonder bowtie2 returns a different result. Are you able to run the alignments with the original RNA-seq reads whatever they were (i.e. PE 100 or whatever)? Then you'll see Tophat actually function.
__________________
/* Shawn Driscoll, Gene Expression Laboratory, Pfaff
Salk Institute for Biological Studies, La Jolla, CA, USA */
sdriscoll is offline   Reply With Quote
Old 10-28-2014, 05:14 PM   #16
Studentlost
Member
 
Location: Sacramento

Join Date: Oct 2014
Posts: 28
Default

I'm not sure I follow.

Here is my prep_reads.info file for one of my samples. This one is 396,555 base pairs:
left_min_read_len=251
left_max_read_len=251
left_reads_in =396555
left_reads_out=396555
right_min_read_len=251
right_max_read_len=251
right_reads_in =396555
right_reads_out=396555

Now here is a prep_reads.info file for a sample from another study that actually produces near perfect alignment:

left_min_read_len=20
left_max_read_len=50
left_reads_in =33601862
left_reads_out=33599795
right_min_read_len=20
right_max_read_len=50
right_reads_in =33601862
right_reads_out=33599859


The only difference seems to be the read length?
Studentlost is offline   Reply With Quote
Old 10-28-2014, 05:26 PM   #17
danwiththeplan
Member
 
Location: Auckland

Join Date: Sep 2011
Posts: 72
Default

Can I ask for some context here? Reads of what? Derived from mRNA, total RNA, DNA? Is it a 250bp PE MiSeq run? What is your genome and is it eukaryotic? Does it have known annotated genes and did you use the annotation track in your tophat run?
danwiththeplan is offline   Reply With Quote
Old 10-28-2014, 05:33 PM   #18
Studentlost
Member
 
Location: Sacramento

Join Date: Oct 2014
Posts: 28
Default

The reads are of mRNA sequenced from single cells of primates. I'm not sure how it was run, my knowledge starts off at the point of raw reads given to me. The genome is the Mmul_1 build by Ensembl, of the rhesus monkey. It has annotated genes and I used a reference GTF in tophat. I also assembled a transcriptome index and tried that. None of this made a difference.
Studentlost is offline   Reply With Quote
Old 10-28-2014, 05:42 PM   #19
danwiththeplan
Member
 
Location: Auckland

Join Date: Sep 2011
Posts: 72
Default

When you used the reference GTF, did you also set the option to only map to annotated genes (-T/--transcriptome-only)?
I'm a bit confused as to why you are expecting Tophat and Bowtie to behave identically. Tophat is splice-aware, bowtie is not. I don't understand why you would ever use Bowtie (or any other non-splice-aware mapper) to map RNA-derived sequence to a genome.
danwiththeplan is offline   Reply With Quote
Old 10-28-2014, 05:44 PM   #20
danwiththeplan
Member
 
Location: Auckland

Join Date: Sep 2011
Posts: 72
Default

Also:
Quote:
I'm not sure how it was run, my knowledge starts off at the point of raw reads given to me
While I understand that this happens occasionally and it's sometimes not under your control, this is a terrible situation and if you can get as much information on the context of the run (Platform? Run type? Library prep kit used? Size selection? Method of size selection?) then you should.
danwiththeplan is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:51 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO