SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Galaxy Tophat mapping problem: illumina paired end RNA data seq alam Bioinformatics 0 01-14-2013 06:40 AM
Bowtie error when mapping ABI RNA-seq data with Tophat HSV-1 RNA Sequencing 5 08-01-2012 01:00 AM
Inner distance value for TopHat / Proper mapping with RNA-Seq PE data ocs Bioinformatics 6 11-27-2011 10:10 AM
Optimizing settings for mapping 100bp RNA-seq reads with tophat seqhorn Bioinformatics 3 10-21-2011 10:40 PM

Reply
 
Thread Tools
Old 02-14-2013, 06:05 AM   #1
bob-loblaw
Member
 
Location: /home/bob

Join Date: Jun 2012
Posts: 59
Default Optimizing tophat mapping for mixed RNA-Seq data

Hi all,

Im currently using Tophat and bowtie2 to map 100bp PE RNA-Seq reads from a mixed human/bacterial sample. Were more interested in the bacterial side of things, but there's plenty that we can learn from the human reads too. We originally used bowtie2 to map human reads to hg19, and then another bowtie2 to map bacterial reads. However we then switched to tophat for obvious reasons and redid the processing, and obviously a much larger number of human reads were mapping. But when we repeated the bowtie2 run for bacterial reads we had significantly less reads map.

Weve also repeated tophat on a few different settings to try find whats optimal. The no-discordant option in tophat changes the results quite a lot both for the amount of human reads mapped, and the number of bacterial reads mapped. I havent looked into the biological outcomes of this yet, but the differences in the amount of reads has me concerned, and the bacterial reads that come out from the file that were preprocessed with tophat on the default settings the no-discordant run
Ive looked into the differences between bacterial reads mapped by bowtie2 after tophat run with default settings and tophat run with the no-discordant option and they only share about 0.0007% of the bacterial reads, which is very odd.

Basically Im wondering if anyone could shed light on why the different tophat parametres have such a huge impact on the amount of reads which bowtie2 later identifies as being bacterial??

Also any general advice would be appreciated
Thanks
bob-loblaw is offline   Reply With Quote
Old 02-14-2013, 07:42 AM   #2
chadn737
Senior Member
 
Location: US

Join Date: Jan 2009
Posts: 392
Default

Are you mapping your reads first to one and then the other or at the same time? Ideally it shouldn't make a differences. The way you described it, where you map with tophat to human then got fewer reads with bowtie2 mapping to bacterial genomes makes me wonder if you are not mapping some of the bacterial reads to the human genome? Its similar to the problem of mapping reads to only part of the genome, rather than the whole genome. Tophat, bowtie2, or any tool will try to map the read no matter what. Maybe a read is genuinely from one genome, but if that genome is absent, it will settle for the best it can get from the reference you give it. Maybe combine your two references, map to both simultaneously, and see what results.
chadn737 is offline   Reply With Quote
Old 02-14-2013, 08:28 AM   #3
bob-loblaw
Member
 
Location: /home/bob

Join Date: Jun 2012
Posts: 59
Default

Quote:
Originally Posted by chadn737 View Post
Are you mapping your reads first to one and then the other or at the same time? Ideally it shouldn't make a differences. The way you described it, where you map with tophat to human then got fewer reads with bowtie2 mapping to bacterial genomes makes me wonder if you are not mapping some of the bacterial reads to the human genome? Its similar to the problem of mapping reads to only part of the genome, rather than the whole genome. Tophat, bowtie2, or any tool will try to map the read no matter what. Maybe a read is genuinely from one genome, but if that genome is absent, it will settle for the best it can get from the reference you give it. Maybe combine your two references, map to both simultaneously, and see what results.
First to one, then to another. I had thought about this before, but when building the bacterial database we hit the max size of a reference database or and index that bowtie2 can build (well that's what I've been told, it was built just before I started this project). This is defiantly something to look into though, thanks!

If I was going to be mapping both human and bacterial simultaneously, we'd have to use tophat in order to efficiently map the human reads (human reads comprise a large amount of the reads in our samples), do you (or anyone else who see's this post) know how using tophat to map bacterial reads would work out? since tophat was designed to look for spliced reads?

Last edited by bob-loblaw; 02-14-2013 at 08:38 AM.
bob-loblaw is offline   Reply With Quote
Old 02-14-2013, 01:11 PM   #4
chadn737
Senior Member
 
Location: US

Join Date: Jan 2009
Posts: 392
Default

The size limit on the index is a problem. You could go ahead and combine them and see if what they told you was true. If it is you will only get an error message.

As for using Tophat on bacterial reads. Tophat will try to align reads first to the genome before looking for splicing. Ideally, all the bacterial reads will align to the bacterial genome in this first round and not be splice. I won't say that wont happen, because inevitably some will have some sort of mismatch and show up spliced.

Have you tried aligning reads to the bacterial genome and then to the human? Or has it only been human than bacterial?
chadn737 is offline   Reply With Quote
Old 02-15-2013, 01:14 AM   #5
bob-loblaw
Member
 
Location: /home/bob

Join Date: Jun 2012
Posts: 59
Default

Quote:
Originally Posted by chadn737 View Post
The size limit on the index is a problem. You could go ahead and combine them and see if what they told you was true. If it is you will only get an error message.

As for using Tophat on bacterial reads. Tophat will try to align reads first to the genome before looking for splicing. Ideally, all the bacterial reads will align to the bacterial genome in this first round and not be splice. I won't say that wont happen, because inevitably some will have some sort of mismatch and show up spliced.

Have you tried aligning reads to the bacterial genome and then to the human? Or has it only been human than bacterial?
I haven't tried aligning reads to the bacterial genome then to human, but originally we were using bowtie2 to map human reads (which only mapped a few thousand reads per file compared to the tens of millions that tophat mapped for the same file). Then when we did the bowtie2 to map bacterial reads we got about 5 or 10 times as many bacterial reads being mapped as we did when we used tophat to align human reads. (So few human reads were being aligned by bowtie2 it gives me an indication of what doing bowtie2 for bacterial reads before tophat for human would result in). Basically I think no matter which alignment we do first we'll have the same problem, that if bacterial goes first then we'll get a lot of false positives, and vice versa for if human goes first. Thanks for all your help here! I'll defiantly be trying a tophat run with a database of both human and bacterial as soon as I can!

Finally if I could ask you one more question, what about the no discordant options that I mentioned in the OP? Do you think I use that parameter when running tophat? Or should I just go with the default settings?

Last edited by bob-loblaw; 02-15-2013 at 02:21 AM.
bob-loblaw is offline   Reply With Quote
Old 02-15-2013, 02:22 AM   #6
bob-loblaw
Member
 
Location: /home/bob

Join Date: Jun 2012
Posts: 59
Default

Another problem that just popped into my head, if tophat tries to align all reads first without looking for splicing, won't I just have the same problem as before that a lot of human reads will be falsely identified as being bacterial? Or do you know if Tophat will first try to align everything without splicing, then with splicing and only return the best hit?
bob-loblaw is offline   Reply With Quote
Reply

Tags
bowtie2, rna-seq, rna-seq mapped reads, tophat

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:52 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO