SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
SAM flag idioms kgulukota Bioinformatics 14 08-23-2016 01:32 AM
bitwise flag in SAM file win804 Bioinformatics 25 02-04-2013 09:56 PM
sam flag is confusing poorphd Bioinformatics 6 01-11-2012 11:46 AM
BWA output bitwise flag for mapped/unmapped reads wenhuang Bioinformatics 1 08-29-2011 04:54 PM
Flag=4 in SAM Rachelly Bioinformatics 2 12-22-2010 03:54 AM

Reply
 
Thread Tools
Old 08-12-2009, 02:07 AM   #1
aurelielaugraud
Member
 
Location: lyon

Join Date: Aug 2009
Posts: 37
Question bitwise flag in sam format and others

Hello,
new to the seqanswers community, I have started with NGS data (illumina) about 2 month ago.
We align with GEM (not published yet but quite good as far as we tried it), we tried Maq but is doesn't seem to work for the moment on our machines (says it is running but not writting anything ....), and I am currently running bwa at the moment as another test.
We want to use SAM format but we don't get how to use th flag field, is anyone having an exemple for us please ? I know bwa outputs can be converted directly but we would need to do it ourselves for the GEM output.

moreover, there is a question on the MAPQ field. we don't have any direct information about that I think. Can we infer it ? how is it calculated ? we have phred-based quality scores for each base.

thanks to the community.
aurelielaugraud is offline   Reply With Quote
Old 08-12-2009, 09:54 AM   #2
totalnew
Member
 
Location: Canada

Join Date: Apr 2009
Posts: 46
Default

FLAG field is normally associated with other attributes to parse the read, likely, if this read is mapped, if this red is paired..... see the description table in samtools manual.

For MAPQ calculation, it is complicated. The mapping quality is the phred-scaled probability that a read alignment may be wrong. In practice, MAPQ is approximated in some ways. bwa approximates the MAPQ as 0, 23, 25 37, 255,...according to the factors of number of best hits, number of second best hits.......
totalnew is offline   Reply With Quote
Old 08-12-2009, 11:15 PM   #3
aurelielaugraud
Member
 
Location: lyon

Join Date: Aug 2009
Posts: 37
Default

Hello and thanks for taking some time to reply .
I have already had a look at samtools manual regarding the flag section but I am sorry it doesn't seem to be enough for me and I would really appreciate an example. or some documentation on how to use bitwise data ...
Maybe I can leave the MAPQ field with 255 value as indicated in the manual is the program I use to map the reads doesn't caclulate this value for me. ( the more it goes, the least I wish to calculate it myself)
aurelielaugraud is offline   Reply With Quote
Old 08-13-2009, 01:35 AM   #4
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 870
Default

Quote:
Originally Posted by aurelielaugraud View Post
We want to use SAM format but we don't get how to use the flag field, is anyone having an exemple for us please?
The SAM flag field although it appears as a single number actually contains several pieces of information which have been combined together. It is a bitwise field, which means that it makes use of the way that computers represent numbers to store several small values stored in one large value.

If you think of a standard integer as being composed of 32 bits (0 or 1) then it would look like:

00000000000000000000000000000000

However SAM uses this single number as a series of boolean (true false) flags where each position in the array of bits represents a different sequence attribute

Bit 0 = The read was part of a pair during sequencing
Bit 1 = The read is mapped in a pair
Bit 2 = The query sequence is unmapped
Bit 3 = The mate is unmapped
Bit 4 = Strand of query (0=forward 1=reverse)

etc.

Constructing the value from the individual flags is fairly easy. If the flag is false don't add anything to the total. If its true then add 2 raised to the power of the bit position.

For example:

Bit 0 - false - add nothing
Bit 1 - true - add 2**1 = 2
Bit 2 - false - add nothing
Bit 3 - true - add 2**3 = 8
Bit 4 - true - add 2**4 = 16

Bit pattern = 11010 = 16+8+2 = 26

So the flag value would be 26.

To extract the individual flags from the compound value you can use a logical AND operation. This will tell you if a specific bit in the compound value is true or not. The exact syntax will depend on the language you're using, but in Perl for instance you could do:

if ($compound & 16) {
print "Reverse";
}
else {
print "Forward";
}

To extract the information from the 4th (therefore 2**4 = 16) bit field.

I hope that makes it a bit clearer.
simonandrews is offline   Reply With Quote
Old 08-13-2009, 05:07 AM   #5
aurelielaugraud
Member
 
Location: lyon

Join Date: Aug 2009
Posts: 37
Default

thank you very much, it does help a lot !!
aurelielaugraud is offline   Reply With Quote
Old 08-13-2009, 07:12 AM   #6
aurelielaugraud
Member
 
Location: lyon

Join Date: Aug 2009
Posts: 37
Default

Now that I think I understand better this flag, I was trying to read the sam file I generated from a maq alignment ...
how can I get a flag of 80 ?
from what I understood :
80 = 64+16
so
1010000
no paired end ... reverse strand ..ok ...
but the most-left 1 suggest that the read is the first in a pair ... so not coherent with no paired-ends ...
maybe I missed something ...
anyone knows ?
thanks !
aurelielaugraud is offline   Reply With Quote
Old 08-13-2009, 07:40 AM   #7
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 870
Default

It would probably help if you could post a few example lines from your SAM output so we could see the flag in context.
simonandrews is offline   Reply With Quote
Old 08-13-2009, 07:51 AM   #8
aurelielaugraud
Member
 
Location: lyon

Join Date: Aug 2009
Posts: 37
Default

EAS89_20:1:22:1316:1797 80 chr1 10003 0 50M * 0 0 GCCCTAACCCTAACCCCAACCCTAACCCTAACCCTAACCCTAACCCTAAC );:@684@A?7@/
=@@);195)/(1><A);2=A?2?4AAB<@7>B?2A;B MF:i:0 NM:i:2 UQ:i:16 H0:i:85 H1:i:58
EAS89_20:1:35:882:1661/1 0 chr1 10007 0 50M * 0 0 TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTA BBBBB
BBCBBBBBCBBABBBBAA?BBBA?BBBBA?@BAB@:A@@?>:@@A MF:i:0 NM:i:0 UQ:i:0 H0:i:85 H1:i:46
EAS89_20:1:97:1232:1179/1 0 chr1 10009 0 50M * 0 0 ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAC BAABB
BBA@BBBA6?A?BA?>:>AA5?>5@?;8??@>=6<+35:;2++/2 MF:i:0 NM:i:0 UQ:i:0 H0:i:85 H1:i:54
EAS89_20:1:35:821:41/1 0 chr1 10011 0 50M * 0 0 CCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCC BBBCBBB?CCBA@
BBCBABBBBB??BBBA??ABBB?;ABA@8@9AB=7-> MF:i:0 NM:i:0 UQ:i:0 H0:i:85 H1:i:46
EAS89_20:1:16:1399:1760 80 chr1 10011 0 50M * 0 0 CCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCC ?83;6=<=2866<
;688??===8;<@8@8@@@8@;BBA=@7?@B9@8ACB MF:i:0 NM:i:0 UQ:i:0 H0:i:85 H1:i:62
EAS89_20:1:32:145:946 80 chr1 10011 0 50M * 0 0 CCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCC A@;A7BBB@B=BB
><B<??B5B=ABB;A9BBB?B8BBB?B@ABCAB@BBB MF:i:0 NM:i:0 UQ:i:0 H0:i:85 H1:i:59
aurelielaugraud is offline   Reply With Quote
Old 08-13-2009, 07:56 AM   #9
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 870
Default

Looking at the SAM examples then 80 would seem to indicate a reverse mapped single end read. I'd check that you definitely ran the search as a paired end search since the output doesn't suggest this. The format says that flags relating to mate pairs only have any meaning when 0x01 is true (ie the read is paired), so the flag for first read in a pair isn't necessarily wrong.

The example in the SAM tools format definintion shows two reads which form a pair with flags 99 and 147.

99 = 01100011
147 = 10010011

99 = Paired, Proper Pair, Mapped, Mate Mapped, Forward, Mate Reverse, First in pair, Not second in pair

147 = Paired, Proper Pair, Mapped, Mate Mapped, Reverse, Mate Forward, Not first in pair, Second in pair

Which all makes perfect sense.
simonandrews is offline   Reply With Quote
Old 08-13-2009, 08:01 AM   #10
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 870
Default

Quote:
Originally Posted by aurelielaugraud View Post
EAS89_20:1:22:1316:1797 80 chr1 10003 0 50M * 0 0
Your mate pair name is * and the mate position and insert size are both 0 indicating you have not done a paired end mapping (which agrees with your flag reading).
simonandrews is offline   Reply With Quote
Old 08-13-2009, 09:54 AM   #11
totalnew
Member
 
Location: Canada

Join Date: Apr 2009
Posts: 46
Default

The flag number is decimal value, I wrote a small tool to convert decimal to Hex and binary string, but it was rarely used. Firstly, you can easily identify if this read is paired (even: SE ; odd: PE), set this at the beginning:
if($FLAG & 0x01)
$pe_flag = 1
else
$se_flag = 1

Then, suppose this is PE library, someone might want to know if one read is the first read in a pair or second read in a pair (indicated in export file of illumina data). You can do this

if ($FLAG & 0x40)
$1st_flag = 1

This is just a simple example, you can extract different info by operating on different bits of FLAG.
totalnew is offline   Reply With Quote
Old 08-13-2009, 12:12 PM   #12
aurelielaugraud
Member
 
Location: lyon

Join Date: Aug 2009
Posts: 37
Default

ok thanks for all the replies
I haven't run the mapping as paired-end . I ran only one file (and it took long enough). I thought I could just try it like that ... and I am used of bwa or gem where I can map the two paired-end files separately which doesn't seem to be the case with maq.(or did I miss something ?)
Anyway so if I understand correctly. as it hasn't been mapped as paires the first two bits are 0 which are ok with me but it somehow knows that it is a paired-end sample ( probably the /1 at the end of the id ??) and it tells me that it is the first read of the pair ... even if it hasn't any pair here.
I just thought I would automatically put this bit to 0 as there is no paired-end on the file I gave it.
99 and 147 is what I get with my bwa try but I made the effort of mapping both paired-end files .

Another thing, is there anyway to know if the output file of maq or bwa is compete (what is for some reason it stops beforehand ) ? i used to count lines for gem output but as bwa and maq have binary compressed format, I don't know if I can check that .

thank you
aurelielaugraud is offline   Reply With Quote
Old 08-13-2009, 12:37 PM   #13
totalnew
Member
 
Location: Canada

Join Date: Apr 2009
Posts: 46
Default

Quote:
Originally Posted by aurelielaugraud View Post
Another thing, is there anyway to know if the output file of maq or bwa is compete (what is for some reason it stops beforehand ) ? i used to count lines for gem output but as bwa and maq have binary compressed format, I don't know if I can check that .

Output of maq is map file, use maq mapview to make it visible and count the lines which can be done by unix command. The output of bwa is sam file which is human readable, you should be able to open it, but in case you generate the bam file, use samtools view to read it, and similarly to get the line number.
totalnew is offline   Reply With Quote
Old 08-13-2009, 11:40 PM   #14
aurelielaugraud
Member
 
Location: lyon

Join Date: Aug 2009
Posts: 37
Default

ok I knew that but it takes quite a long time and disk space to make the file human readable ... doesn't matter, i will try and see ...
thanks
aurelielaugraud is offline   Reply With Quote
Old 08-14-2009, 09:38 AM   #15
totalnew
Member
 
Location: Canada

Join Date: Apr 2009
Posts: 46
Default

Quote:
Originally Posted by aurelielaugraud View Post
ok I knew that but it takes quite a long time and disk space to make the file human readable ... doesn't matter, i will try and see ...
thanks
You are right, sam file is quite large size, from my simulation, the sam file is generally four times larger than corresponding bam file. My idea is to write a script to generate sam file and further the bam file, and in that script delete the sam file right after the bam file generated.

To view that bam file, use samtools view test.bam, then you will be able to see the content. On the other hand, you will also be able to output part of bam to sam file by unix command for your parsing.
totalnew is offline   Reply With Quote
Old 01-06-2010, 02:03 PM   #16
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

what would a flag 0 imply in sam output? I used novo align and the only flags I see are 0 4 and 16. 4 is unmapped read. 16 is for the strand, but how to interpret 0.

Also, How can I ascertain the reads that are *not* uniquely mapped. I read that the 5th column MAPQ should be of help to determine multiply-mapped reads. Is MAPQ=0 an indication that the read is multiply-mapped?

Thanks
__________________
--
bioinfosm
bioinfosm is offline   Reply With Quote
Old 01-06-2010, 04:36 PM   #17
totalnew
Member
 
Location: Canada

Join Date: Apr 2009
Posts: 46
Default

Quote:
Originally Posted by bioinfosm View Post
what would a flag 0 imply in sam output? I used novo align and the only flags I see are 0 4 and 16. 4 is unmapped read. 16 is for the strand, but how to interpret 0.

Also, How can I ascertain the reads that are *not* uniquely mapped. I read that the 5th column MAPQ should be of help to determine multiply-mapped reads. Is MAPQ=0 an indication that the read is multiply-mapped?

Thanks
Flag 0 means "the read is not paired and mapped, forward strand"

You are right, MAPQ =0 is an indication that this read has multiple best hits, which means it is not unique match.
totalnew is offline   Reply With Quote
Old 01-07-2010, 03:24 PM   #18
yxi
Junior Member
 
Location: texas

Join Date: Apr 2009
Posts: 6
Default

Actually, I always have the question why SAM format using a binary FLAG field instead of text/char field to indicate the mapping status. IMHO a human readable FLAG field would make much more sense, and maybe the strand information could be listed as a separated field. It's also a headache to look at those aux fields. Excuse me but I have the feeling that SAM format is somewhat over complicated, is there any possibility the SAM format could be modified so that people could read it more easily?

Thanks.
yxi is offline   Reply With Quote
Old 01-08-2010, 07:27 AM   #19
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

@yxi

Please use "samtools view -X" to see a human readable FLAG. I agree that not specifying a better FLAG field initially is a shortcoming, but it is too late to change the spec at the moment. samtools view -X comes as a temporary hack which I find useful.

Could you suggest a better format for the aux fields or to make SAM simpler? Note that SAM should be both human readable and machine readable. The current form is the best we can come to so far. Genbank/EMBL files are human readable, but they cause a lot of troubles in parsing, and we do not want to go in that way again. I think the best solution to human readability is not to change the spec, but to write a script to print a SAM alignment in multiple lines in a beautiful way. If you want to contribute to such a script, that would be great. Thanks.
lh3 is offline   Reply With Quote
Old 01-08-2010, 10:52 AM   #20
totalnew
Member
 
Location: Canada

Join Date: Apr 2009
Posts: 46
Default

Currently in most of sam/bam files, the header is empty. However, this could potentially be a good place to store all the info people are interested. From the SAM format specification, HEADER SECTION has four default record types - @HD, @SQ, @RG, @RG.
Can we define our own record type, such as @MN.... or it won't be recognized by samtools.

In addition, each record type has predefined data field, such as VN, SO, GO in @HD. Can we add more data field?

thanks
totalnew is offline   Reply With Quote
Reply

Tags
format, quality, sam

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:38 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO