SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
sam flag is confusing poorphd Bioinformatics 6 01-11-2012 10:46 AM
Sam Flag 65 and 129 after BWA Tomi Bioinformatics 3 10-05-2011 12:13 PM
Flag=4 in SAM Rachelly Bioinformatics 2 12-22-2010 02:54 AM
sam flag 97 and 145 hollandorange Bioinformatics 8 05-14-2010 02:03 PM
SAM file flag problem ptong7 Bioinformatics 4 07-30-2009 03:32 AM

Reply
 
Thread Tools
Old 02-01-2012, 12:16 PM   #1
kgulukota
Member
 
Location: Illinois

Join Date: Oct 2011
Posts: 30
Default SAM flag idioms

"There are 10 types of people in this world: those who assimilated binary numbers and those who didn't."

I definitely belong to the 10'th type and hence SAM Flags are a chore. They may be a very compact way of communicating a lot of info about an alignment, but how do we humans learn them? I know it is kind of nerdy to actually look through SAM files but, what can I say? Mea culpa.

Anyway, this post is my attempt to understand them like a natural language i.e. recognize some idiomatic representations in flags. If you already know these, you are a "binar" and way ahead of us humans on this topic.

You can use this handy little web page for specific flags:
http://picard.sourceforge.net/explain-flags.html

However, to "speak SAM", we must know these flags without having to refer to a web page for each line. So, here are some simple idioms.

Unpaired Reads

For unpaired reads, the flags are very easy to recognize because there are only 3 values:
  • 4 - 0000000100 - means "this is an unpaired read and is not mapped".
  • 16 - 0000010000 - "this unpaired read is mapped in the reverse orientation".
  • 0 - 0000000000 - "this unpaired read is mapped in the forward orientation".
I guess it is theoretically possible to have a flag of 20 meaning "unpaired, unmapped read presented in reverse orientation" - however, I doubt any software will do that. Perhaps, that is our first SAM joke: Did you hear about AnnoyingAlign? It is the software that 20's all unpaired, unmapped reads - just to get on users' nerves.

Paired Reads

For paired reads, 0'th bit HAS to be set. Hence all flags for paired reads HAVE to be odd. In other words, all even-numbered flags other than the above three (0, 4 and 16) are meaningless. (Good progress. We can recognize non-sense words. Writing a Jabberwocky poem with these flags is left as an exercise for the reader).

For paired reads all flags in the intervals [65-127] and [193-255] relate to the first read of a pair. All other (odd) flags refer to the second read in a pair.


"All Good"

Some values mean "all good" i.e. that both reads in the pair have aligned:
  • 65 - 0001000001 - this is first read in pair and both reads aligned the forward strand.
  • 129 - 0010000001 - This is second read of pair and both reads aligned the forward strand.
Quote:
NOTE: 67 (0001000011) and 131 (0010000011) also mean the same as 65 and 129 with the added assurance that "the pair is properly aligned" meaning that they mapped within a proper distance from each other.
Sometimes both reads of a pair are flipped (reverse complemented) before mapping. If so, you get 113 or 177.
  • 113 - 0001110001 - "this is the first read of a pair, both reads in pair were flipped and both mapped".
  • 177 - 0001110001 - "this is the second read of a pair, both reads in pair were flipped and both mapped".
Other times only one of the reads in a pair is flipped though both of them map:
  • 81 - 0001010001 - "this is the first read of pair, both reads mapped, we had to flip this read, but mate is in forward orientation".
  • 161 - 0010100001 - "this is second read, this one is forward but we flipped its mate and both reads mapped".
Quote:
NOTE: 163 (0010100011) and 83 (0001010011) are the same as 161 and 81 except "it is in a proper pair".
  • 97 - 0001100001 - "this is first read, its mate is flipped but this is forward. Both mapped".
  • 145 - 0010010001 - "this is second read. it is flipped but its mate is not. Both mapped".
Quote:
NOTE: 99 (0001100011) and 147 (0010010011) are the same as 97 and 145 except with "proper mapping in pair".
Exercise: Can you see why the number of reads with flag 113 must be equal to the number of reads with flag 177. Similarly, 81=161 and 97=145. If those numbers don't match, something went wrong with your aligner.

"All Bad"
At the other end of the spectrum we have "all bad" i.e. neither the read nor its mate mapped:

77 - 0001001101 - First in pair, both reads in pair unmapped. "All bad"

141 - 0010001101 - Second in pair and "all bad".

Quote:
  • Exercise: Just like with 20, AnnoyingAlign puts flags of 93 or 125 on all unmapped pairs. What other flags can AnnoyingAlign use to maximize user annoyance?
  • Exercise: Why are 79 and 143 particularly good words for Jabberwocky?
Only one read maps

Next, we have the cases when only one read in a pair is mapped.
  • 69 - 0001000101 - First read in pair. This read is unmapped but its mate is mapped.
  • 137 - 0010001001 - second in pair. Read is mapped but mate is unmapped.
  • 73 - 0001001001 - First read in pair. This read is mapped but its mate is not.
  • 133 - 0010000101 - 2nd in pair. Read unmapped but mate is mapped.
Can you again see why number of reads with flag of 69 must be the same as the number of reads with flag of 137?

There are of course many other combinations. The purpose here is not to enumerate them but to simply have some fun with the structure of these flags.

What is your favorite flag? Do you have other ways of remembering what these things mean as you look through SAM files?
__________________
Kamalakar Gulukota,
Director,
Center for Bioinformatics and Computational Biology
NorthShore University Health System, kgulukota@northshore.org
kgulukota is offline   Reply With Quote
Old 02-01-2012, 01:30 PM   #2
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

Personally, the ones and zeros aren't helpful to me. I don't think of 147 as "0010010011", but as "128+16+2+1", and I remember what all those numbers stand for. And in most contexts, having both reads map in the forward direction or both map in the reverse direction is not all good, it's weird.

The four good numbers to remember are 64+16+2+1=83, 64+32+2+1=99, 128+16+2+1=147 and 128+32+2+1=163. Something is very wrong if you ever see both 128 and 64 together, and with most current technologies, you should see 16 or 32, but not both. If you see both, or don't see either, your reads are paired strangely.
swbarnes2 is offline   Reply With Quote
Old 02-03-2012, 12:26 PM   #3
liu_xt005
Member
 
Location: Iowa City, IA

Join Date: Jun 2011
Posts: 24
Default

Thank you so much! This is so useful!
Why did not see it hot?
liu_xt005 is offline   Reply With Quote
Old 02-03-2012, 07:20 PM   #4
kgulukota
Member
 
Location: Illinois

Join Date: Oct 2011
Posts: 30
Default

Quote:
Originally Posted by liu_xt005 View Post
Thank you so much! This is so useful!
Why did not see it hot?
Liu_xt005 -
I am glad you found it useful. I am not sure why this did not show up hot. But your reply did promote it there. So, thanks!

Gulu
__________________
Kamalakar Gulukota,
Director,
Center for Bioinformatics and Computational Biology
NorthShore University Health System, kgulukota@northshore.org
kgulukota is offline   Reply With Quote
Old 04-03-2012, 01:17 PM   #5
sterding
Member
 
Location: Boston

Join Date: Sep 2010
Posts: 36
Default

Very useful! Thanks
sterding is offline   Reply With Quote
Old 04-25-2012, 04:06 PM   #6
seq_lover
Member
 
Location: Memphis

Join Date: Oct 2011
Posts: 18
Default

I assume kgulukota is trying to give example for mate pair library (solid) and swbarnes2 is giving example for paired end (illumina). I think both of them are correct. Please correct me if I am wrong.
seq_lover is offline   Reply With Quote
Old 04-25-2012, 05:38 PM   #7
kgulukota
Member
 
Location: Illinois

Join Date: Oct 2011
Posts: 30
Default

That is true seq_lover. Which combinations you consider "all good" and which ones "weird" depends on how you constructed your library. Thank you for putting it so succinctly.
__________________
Kamalakar Gulukota,
Director,
Center for Bioinformatics and Computational Biology
NorthShore University Health System, kgulukota@northshore.org
kgulukota is offline   Reply With Quote
Old 03-21-2014, 09:28 AM   #8
dan
wiki wiki
 
Location: Cambridge, England

Join Date: Jul 2008
Posts: 266
Default

Thanks for this post, do you accept doge tips?
__________________
Homepage: Dan Bolser
MetaBase the database of biological databases.
dan is offline   Reply With Quote
Old 04-08-2015, 07:51 AM   #9
Smurali
Junior Member
 
Location: Houston

Join Date: Mar 2013
Posts: 4
Default

Years later, this is still pretty darn useful.
Thanks!
Smurali is offline   Reply With Quote
Old 08-17-2016, 11:50 AM   #10
caiosuz
Member
 
Location: Brazil_Bahia_Ilheus

Join Date: Dec 2015
Posts: 12
Default What's going on here?

I don't know if I am understanding the correct meaning of the reads index in sam files.

This information is present in the Flags description:

'Next, we have the cases when only one read in a pair is mapped.
69 - 0001000101 - First read in pair. This read is unmapped but its mate is mapped.
133 - 0010000101 - 2nd in pair. Read unmapped but mate is mapped.'


Soooo, does it means that If I have a read with 133 or 69, its paired read can't be present in the unmapped reads file, ok?
I am assuming that reads with the same index (in this case "M03092:8:000000000-AG2GN:1:2117:2591:14346") are paired. Am I correct? If I am wrong I understood what happened but I'd like to know what are these lines with same index.

Following this line of thought (same index, paired reads), why are there so many lines of my unmapped paired reads like this?

M03092:8:000000000-AG2GN:1:2117:2591:14346 69
M03092:8:000000000-AG2GN:1:2117:2591:14346 133

Can anyone explain what's going on with these reads?
caiosuz is offline   Reply With Quote
Old 08-18-2016, 12:26 AM   #11
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Your aligner seems to have a bug, the flags should be 77 and 141 if both mates are unmapped.
dpryan is offline   Reply With Quote
Old 08-18-2016, 05:30 AM   #12
caiosuz
Member
 
Location: Brazil_Bahia_Ilheus

Join Date: Dec 2015
Posts: 12
Default Crazy TopHat unmapped reads

Did anyone have the same problem with these unmapped reads?
caiosuz is offline   Reply With Quote
Old 08-19-2016, 04:15 AM   #13
offspring
Member
 
Location: Lund, Sweden

Join Date: Mar 2013
Posts: 32
Default

It's a bug in TopHat (all versions); it doesn't set the 0x8 bit ("next segment in the template unmapped") when both reads are unmapped.

This is one of the issues TopHat-Recondition fixes (https://bmcbioinformatics.biomedcent...859-016-1058-x , https://github.com/cbrueffer/tophat-recondition).
offspring is offline   Reply With Quote
Old 08-23-2016, 12:27 AM   #14
Macspider
Member
 
Location: Vienna

Join Date: Feb 2016
Posts: 35
Default

Yes, it is a bug in TopHat. Didn't they fix it in TopHat2? I recently used it and the flags were alright.
Macspider is offline   Reply With Quote
Old 08-23-2016, 12:32 AM   #15
offspring
Member
 
Location: Lund, Sweden

Join Date: Mar 2013
Posts: 32
Default

It's still unfixed (as of TopHat 2.1.1) and unlikely to be fixed at all, since TopHat is not really being developed anymore (the developers focus on HISAT2, its successor).

Did you use TopHat via bcbio-nextgen by any chance? That fixes the unmapped reads file for you automatically; other frameworks may do the same.
offspring is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:27 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO