SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
Mapping strand-specific data with TopHat Comosus RNA Sequencing 6 08-13-2012 02:25 AM
Strand-specific library appears not strand-specific oligo Illumina/Solexa 7 12-08-2011 10:54 AM
strand specific RNAseq Pepe Sample Prep / Library Generation 5 10-27-2011 08:58 AM
SHRiMP and Cufflinks for strand-specific SOLiD bacteria RNA Seq questions pjorth Bioinformatics 0 05-17-2011 01:45 PM
Cufflinks library-type for SOLiD strand specific single data gengen Bioinformatics 1 10-19-2010 09:41 AM

Reply
 
Thread Tools
Old 04-18-2011, 03:07 AM   #1
Ender985
Member
 
Location: Spain

Join Date: Mar 2009
Posts: 12
Default TopHat: Aligning SOLiD strand specific RNAseq

Hey all,

I'm interested in realigning some single-end strand specific RNAseq data I got from SOLiD, using TopHat.

However, I've been completely unable to map the reads to only one strand at a time. In the current manual (version 1.2.0) I can not find any flag for forcing a no-reverse complement search. All I found is the confusing --library-type fr-firststrand/fr-secondstrand/fr-unstranded flag, and all of them align reads to both strands anyway.

I know that TopHat is built on top of Bowtie, so I tried aligning the reads by directly using Bowtie, and then, by using the --norc and --nofw flags I am actually able to map to one strand at a time.

Has anyone found a way to make it work properly with TopHat?
Thanks!
Ender985 is offline   Reply With Quote
Old 04-21-2011, 09:26 AM   #2
Ender985
Member
 
Location: Spain

Join Date: Mar 2009
Posts: 12
Default

No one?

I have these imput files:
forward.csfasta, forward.qual
reverse.csfasta, reverse.qual

I proceed to align only the forward files, using the typical TopHat flags,
Code:
tophat --color --quals -p 7 -g 1 --coverage-search
plus any of the three options that seem to be strand-related,
Code:
--library-type fr-unstranded
--library-type fr-firststrand
--library-type fr-secondstrand
and I look at the reads that map on a gene on the reverse strand. With any of the three options, there are still some reads mapping to that gene.

I was expecting zero hits, since I'm aligning forward reads to a reverse gene. However none of the methods gives me that. The other option that occured to me would be that if the protocol is not 100% reliable, the 'forward' labeled reads would contain some real 'reverse' reads. If this were the case, then maybe TopHat was still mapping them somehow.

Looking a little bit deeper into that, I searched the sam file for the strand information of the aligned reads that fell in (a region of) that gene, to discover that

- For the fr-unstranded flag: there are 3580 hits, only 1 with the XS:A:- (correct) strand and 15 with the XS:A:+ (wrong) strand (the rest simply do not have strand information)
- For the fr-firststrand: there are 3579 hits, 3564 with the XS:A:- (correct) strand and 15 with the XS:A:+ (wrong) strand
- For the fr-secondstrand: there are 3580 hits, 2 with the XS:A:- (correct) strand and 3564 with the XS:A:+ (wrong) strand

A first glimpse, it would seem that the fr-firststrand flag does exactly that, since the majority of the aligned reads report negative (correct) strand. However I got a little bit suspicious to see that exactly the same number of reads were reported to map to the positive (wrong) strand by using the fr-secondstrand flag. A look at the names of the reads revealed that indeed, it is the very same reads that are being mapped to the region with the two methods, and that only the reported strand is changed, like if TopHat was only adding the strand information based on which flag did I use, but not producing it himself by seeing if he could map the read to the expected strand, or if he had to make the reverse compliment in order to map it.

I'm still at a loss here, since as I said in the OP, bowtie is able to correctly map those very same reads to one strand or the other by the use of the --norc and --nofw flags.

Any thoughts?

Last edited by Ender985; 04-21-2011 at 09:35 AM.
Ender985 is offline   Reply With Quote
Old 06-17-2011, 08:33 AM   #3
Ender985
Member
 
Location: Spain

Join Date: Mar 2009
Posts: 12
Default

Well, just for future reference, I managed to do it. But it is not as straightforward as one would whish.

One of the biggest problems I had is that I was under the assumption that one file contained only the 'originally forward reads' (as in, reads coming from forward-stranded genes) and another with the 'originally reversed reads'. But this turned out to be completely false. What I really had was two files (since it was a PE experiment) with a number of reads coming from both strands, that were guaranteed NOT to be complimentary to the original RNA orientation, as this is what the strand-specific protocol does for you.

Once that was cleared out, I still had to battle against the poorly explained library flags of TopHat. After several tries, it turns out that for the SOLiD data, the flag that finally gave me correct results was --fr-secondstranded , but I think there is no easy way to know which is the flag you need to use short of trying both of them and seeing which one produces correctly stranded results.

The third problem was that TopHat silently does the strand specific alignment, but then dumps both forward and reverse reads in the same .bam file. It turns out you have to manually extract the reads that mapped to each strand. I did using samtools and grep:

samtools view -h accepted_hits.sorted.bam | grep -e XS:A:+ -e '^@' | samtools view -bS -o forward/accepted_hits.bam -
samtools view -h accepted_hits.sorted.bam | grep -e XS:A:- -e '^@' | samtools view -bS -o reverse/accepted_hits.bam -


Ony then, if you produce pileups from the two resulting .bam files, you will have a pileup with the forward RNA information, and another one with the reverse one.

I hope this post will save some people the few weeks of frustration I had to undergo to finally solve this.
Ender985 is offline   Reply With Quote
Old 07-09-2012, 11:37 PM   #4
dadada4ever
Member
 
Location: Beijing

Join Date: Mar 2010
Posts: 18
Default

Quote:
Originally Posted by Ender985 View Post
Well, just for future reference, I managed to do it. But it is not as straightforward as one would whish.

One of the biggest problems I had is that I was under the assumption that one file contained only the 'originally forward reads' (as in, reads coming from forward-stranded genes) and another with the 'originally reversed reads'. But this turned out to be completely false. What I really had was two files (since it was a PE experiment) with a number of reads coming from both strands, that were guaranteed NOT to be complimentary to the original RNA orientation, as this is what the strand-specific protocol does for you.

Once that was cleared out, I still had to battle against the poorly explained library flags of TopHat. After several tries, it turns out that for the SOLiD data, the flag that finally gave me correct results was --fr-secondstranded , but I think there is no easy way to know which is the flag you need to use short of trying both of them and seeing which one produces correctly stranded results.

The third problem was that TopHat silently does the strand specific alignment, but then dumps both forward and reverse reads in the same .bam file. It turns out you have to manually extract the reads that mapped to each strand. I did using samtools and grep:

samtools view -h accepted_hits.sorted.bam | grep -e XS:A:+ -e '^@' | samtools view -bS -o forward/accepted_hits.bam -
samtools view -h accepted_hits.sorted.bam | grep -e XS:A:- -e '^@' | samtools view -bS -o reverse/accepted_hits.bam -


Ony then, if you produce pileups from the two resulting .bam files, you will have a pileup with the forward RNA information, and another one with the reverse one.

I hope this post will save some people the few weeks of frustration I had to undergo to finally solve this.
An useful thread to me.
I wanna say that it seems not appropriate to align RNA-Seq reads with Bowtie cuz it doesn't support the split reads mapping.
And the definition of XS:A which produced by Tophat is telling you the strand of the transcript that produced the read, not the read itself.

I also have some questions Ender, could you give me some help?
1. I have the SOLiD RNA-Seq PE data, as far as I concerned, the F3 and F5 are in opposite orientation and both of them mapping to the different strands separately, am I right? Because I saw a Post (the last one) http://seqanswers.com/forums/showthread.php?t=6317 illustrated the F3 and F5 are actually on the same strand, which confused me very much.
2. If I'm not wrong, so if I wanna see the reads on the positive strand, is that correct (like you do) to get the reads with XS:A:+ tag, not just simply sum up the reads mapping to the positive strand cuz the F5 read in the same pair which maps to negative strands are actually from the positive strand too?

Thank you.
Conan.

Last edited by dadada4ever; 07-09-2012 at 11:38 PM. Reason: the code command in my reply cause reading difficulty
dadada4ever is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:32 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO