SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
sam flag is confusing poorphd Bioinformatics 6 01-11-2012 10:46 AM
Inconsistency in Bowtie Flag Output jmt RNA Sequencing 0 06-20-2011 12:08 PM
Flag=4 in SAM Rachelly Bioinformatics 2 12-22-2010 02:54 AM
Paired end inconsistency in tophat SAM dariober Bioinformatics 2 11-06-2010 06:59 AM
SAM flag field and removing unmapped reads from BFAST output aiden Bioinformatics 3 05-27-2010 06:10 PM

Reply
 
Thread Tools
Old 08-01-2011, 12:23 PM   #1
mhayes
Member
 
Location: Cleveland, OH

Join Date: Aug 2011
Posts: 11
Default Inconsistency with SAM flag output?

Hi all.

I'm very confused about the output that I'm getting here.

Consider the below SAM output. These are the mapping results of a single read pair. You can see that the first read maps to chromosome 6 at 5007, and the second maps to 5149.

However, the SAM flags suggest that the first read is the *second* in the pair, and that the second read is the *first* in the pair. Also, the first read maps to the forward strand, while the second read maps to the reverse strand.

-----------------
k_2_6_11305011 163 chr6 5007 0 75M * 0 217 ATATAACTGCGAGATTAATCTCAGACAATGACACAAAATATAGCGAAGTTGGTAAGTTATTTAGTAAAGCTCATG BBB;CBBC4)7B8B=-BB;B?BB?2*;BB-BBBBBBBB?C-;B-@>AC8=B909BB0@4<8-B;-=B0B@+;C--
MF:i:18 AM:i:0 SM:i:0 NM:i:2 UQ:i:21 H0:i:0 H1:i:0

k_2_6_11305011 83 chr6 5149 0 75M * 0 -217 TTTATCTTTCAACAACTTGTGTGTTATATTTTGGAATACAGATACAAAGTTATTATGCTTTCAAAATATTCTTTT ?BB?BBB?BB8BBB0=-=BBBBB?==BB?BBB?B=B?-0?BBB8B--B8BBBBB-C8C=?=BBBB8?BBBCB=8B
MF:i:18 AM:i:0 SM:i:0 NM:i:0 UQ:i:0 H0:i:4 H1:i:0

-----------------

The SAM flags suggest that this pair is 'everted' (i.e. the first strand is reverse, while the second strand is forward). However this is not really the case.

Am I interpreting this output correctly?
mhayes is offline   Reply With Quote
Old 08-01-2011, 01:42 PM   #2
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

The read at 5007 has a flag of 163, which = 128+32+2+1 = second read, forward direction.
The read at 5149 has a flag of 83, which = 64+16+2+1 = first read, reverse direction.

That's a proper pair; the two ends point in towards each other. There's nothing wrong here.

I'm not sure how you determined that the read at 5007 was read 1; does the sequence from the read1 fastq blat to 5007? Maybe there was a mix up in the order that the files were given to the software that did the alignment or made the .sam, because the only way to know from looking at the sam output alone which was the first read and which was the second is to look at the flags.
swbarnes2 is offline   Reply With Quote
Old 08-01-2011, 05:17 PM   #3
mhayes
Member
 
Location: Cleveland, OH

Join Date: Aug 2011
Posts: 11
Default

My assumption was that the second read would be the one mapped to the more distal location.

Per the output I provided, the "second" read is actually the one that comes first in mapping (at 5007). That's why I'm confused.
mhayes is offline   Reply With Quote
Old 11-17-2011, 10:13 PM   #4
jay2008
Member
 
Location: Australia

Join Date: Sep 2010
Posts: 44
Default

the flag in sam file is really confusing to me as well.
what is the meaning of "second read"? does it mean from the second fastq file?
I am using tophat.
jay2008 is offline   Reply With Quote
Old 11-18-2011, 08:32 AM   #5
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

Yes, the second read is the second fastq you give to your mapping software, from the reads originating at adaptor 2.

It's not like the DNA molecules and the adaptor molecules know which end of the DNA is closer to what your reference has arbitrarily designtated the beginning of the DNA sequence. So of course there can't be a correlation between read 1 and the read closer to the beginning of your reference.
swbarnes2 is offline   Reply With Quote
Old 11-18-2011, 01:16 PM   #6
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

See http://picard.sourceforge.net/explain-flags.html to help translate flags. Also check out the SAM spec for more explanations.
nilshomer is offline   Reply With Quote
Old 11-18-2011, 04:06 PM   #7
jay2008
Member
 
Location: Australia

Join Date: Sep 2010
Posts: 44
Default

if a pair is mapped into genome as below,
-------> <-------
read1 read2
does it mean the pair is located in + strand?

otherwise, if a pair is mapped into genome as below,
-------> <-------
read2 read1
does it mean the pair is located in - strand?
jay2008 is offline   Reply With Quote
Old 11-18-2011, 06:28 PM   #8
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

Quote:
Originally Posted by jay2008 View Post
if a pair is mapped into genome as below,
-------> <-------
read1 read2
does it mean the pair is located in + strand?

otherwise, if a pair is mapped into genome as below,
-------> <-------
read2 read1
does it mean the pair is located in - strand?
This question makes no sense.

The DNA that went onto the flow cell is double stranded. If your two reads overlapped perfectly, one would be a rev comp of the other, not the reverse.
swbarnes2 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:48 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO