SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
EBARDenovo - A new RNA-seq do novo assembler for paired-end Illumina data htchu.taiwan Illumina/Solexa 9 04-15-2013 11:08 PM
Galaxy Tophat mapping problem: illumina paired end RNA data seq alam Bioinformatics 0 01-14-2013 06:40 AM
how to determine strand from tophat output for paired-end RNA-seq data jay2008 Bioinformatics 1 05-30-2012 04:46 AM
RNA-Seq: A Probabilistic Framework for Aligning Paired-end RNA-seq Data. Newsbot! Literature Watch 11 10-16-2010 09:27 AM

Reply
 
Thread Tools
Old 03-27-2013, 01:32 PM   #1
hubin.keio
Junior Member
 
Location: NM, USA

Join Date: Jan 2012
Posts: 4
Default Mapping paired-end stranded RNA-seq data

Hello. I need to map paired-end stranded RNA-seq reads. The library was prepared using ScriptSeq. Can anyone show me how to assign transcripts with the strand information based on mapping? Filtering mapped reads with the 0x10 flag seems not working here, since I got paired-end reads. Thanks.
hubin.keio is offline   Reply With Quote
Old 03-27-2013, 03:56 PM   #2
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

I don't know why that wouldn't work, unless you are looking for reads that have a flag that is literally 16. In a paired end read, they will certainly have the 1 flag, and either the 128 or the 64. And if they are properly paired, they will have the 2 flag, and they might also have the 16 and 32 (hopefully exactly one of those, but 2 or 0 is possible), and the 4 and or 8 might be flagged too.
swbarnes2 is offline   Reply With Quote
Old 03-28-2013, 07:01 AM   #3
alexdobin
Senior Member
 
Location: NY

Join Date: Feb 2009
Posts: 161
Default

Most RNA-seq mappers will set the 0x10 bit if the actual sequence from .fastq files is reverse complemented with respect to the "+" genome strand. I believe this is a standard SAM convention. Illimuna PE sequencing produces sequences in such a way that one of the mates agrees with the original RNA, while the other is reverse complementary. Hence, for a correctly (concordantly) paired alignment, the mates will always have inverted 0x10 bits.

If you have stranded RNA-seq data, then you know - from the library construction - which of the mates, 1st or 2nd, agrees with the original RNA. For example, in the standard "dUTP" protocol it is the 2nd mate that is the first end of the RNA fragments. As @swbarnes2 pointed out, 0x40 bit is set for the 1st mate and 0x80 bit is set for the 2nd mate. Here is awk code:

if ( (and(FLAG,0x80)>0 && and(FLAG,0x10)==0) || ( (and(FLAG,0x80)==0 && and(FLAG,0x10)>0) )
{
strand="+";
} else {
strand="-";
};

Again, you have to check which mate has the true RNA strand in your ScriptSeq prep.
alexdobin is offline   Reply With Quote
Old 03-28-2013, 02:10 PM   #4
hubin.keio
Junior Member
 
Location: NM, USA

Join Date: Jan 2012
Posts: 4
Default

Hello @alexdobin and @swbarnes2, thank you so much for your kind response. Your explanation are really useful. After reading the sam format documentation several times and your replies, I started to understand the problem. It looks for my bam file, only the properly-paired reads (0x2) have the right strand information for our spiking control sequence. Without filtering the reads with 0x2, 0x10 flag still gave me wrong strand information in a few cases. Again, thanks for your help.
hubin.keio is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:47 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO