SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
SAM flag idioms kgulukota Bioinformatics 14 08-23-2016 01:32 AM
bitwise flag in SAM file win804 Bioinformatics 25 02-04-2013 09:56 PM
sam flag is confusing poorphd Bioinformatics 6 01-11-2012 11:46 AM
BWA output bitwise flag for mapped/unmapped reads wenhuang Bioinformatics 1 08-29-2011 04:54 PM
Flag=4 in SAM Rachelly Bioinformatics 2 12-22-2010 03:54 AM

Reply
 
Thread Tools
Old 02-05-2010, 01:53 PM   #21
Nix
Member
 
Location: SLC, Utah

Join Date: Jun 2008
Posts: 60
Default

Wow, the more I dig into sam the more it looks like a horse put together by a committee. These bit flags are ridiculous for a human readable text alignment format.
Nix is offline   Reply With Quote
Old 02-05-2010, 04:42 PM   #22
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by Nix View Post
Wow, the more I dig into sam the more it looks like a horse put together by a committee. These bit flags are ridiculous for a human readable text alignment format.
It was put together by a committee. The flag field, in my opinion, is left over from the internal C representation, which was created for compactness (then compressed into SAM). You will encounter the internal format if you program using the SAMTools C API but nonetheless the internal (compact) data structure should not have been shown to the user when viewing SAM (not BAM files). Just wait until you try to get anything clarified/changed/included in the specification.

Nils
nilshomer is offline   Reply With Quote
Old 02-05-2010, 07:31 PM   #23
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

If SAM were designed by one person, it would not be widely accepted. There are pros and cons. As to the bit flag, have you tried "samtools view -X"?
lh3 is offline   Reply With Quote
Old 02-05-2010, 09:38 PM   #24
drio
Senior Member
 
Location: 41°17'49"N / 2°4'42"E

Join Date: Oct 2008
Posts: 323
Default

I particularly like the TCGA BAM specification for the BAM headers. It uses most of the header types and their associated tags to capture things like: sequencing platform, reference contig details(MD5s, url..), samples, institution, library, etc...
__________________
-drd
drio is offline   Reply With Quote
Old 02-05-2010, 10:02 PM   #25
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by lh3 View Post
If SAM were designed by one person, it would not be widely accepted. There are pros and cons. As to the bit flag, have you tried "samtools view -X"?
It is widely accepted because good tools (i.e. lh3's implementation in SAMTools and now Picard) were available early and was used by the first major sequencing efforts (aka 1000 Genomes). I no way mean to lambaste the creators of SAM, only to be honest about its development and implementation.
nilshomer is offline   Reply With Quote
Old 02-06-2010, 10:01 AM   #26
Nix
Member
 
Location: SLC, Utah

Join Date: Jun 2008
Posts: 60
Default

OK, 4hrs later and I can parse the strand.....

Here's some java code so folks don't have to go through this again:

<pre>
/**
* Index Index^2 MethodName Flag DescriptionFromSamSpec
* 0 1 isReadPartOfAPairedAlignment 0x0001 the read is paired in sequencing, no matter whether it is mapped in a pair
* 1 2 isReadAProperPairedAlignment 0x0002 the read is mapped in a proper pair (depends on the protocol, normally inferred during alignment) 1
* 2 4 isQueryUnmapped 0x0004 the query sequence itself is unmapped
* 3 8 isMateUnMapped 0x0008 the mate is unmapped
* 4 16 isQueryReverseStrand 0x0010 strand of the query (false for forward; true for reverse strand)
* 5 32 isMateReverseStrand 0x0020 strand of the mate
* 6 64 isReadFirstPair 0x0040 the read is the first read in a pair
* 7 128 isReadSecondPair 0x0080 the read is the second read in a pair
* 8 256 isAlignmentNotPrimary 0x0100 the alignment is not primary (a read having split hits may have multiple primary alignment records)
* 9 512 doesReadFailVendorQC 0x0200 the read fails platform/vendor quality checks
* 10 1024 isReadADuplicate 0x0400 the read is either a PCR duplicate or an optical duplicate
*/
public boolean testBitwiseFlag(int testValue){
return ((flags & testValue) == testValue);
}
/**The read is paired in sequencing, no matter whether it is mapped in a pair*/
public boolean isPartOfAPairedAlignment(){
return testBitwiseFlag(1);
}
/**The read is mapped in a proper pair (depends on the protocol, normally inferred during alignment)*/
public boolean isAProperPairedAlignment(){
return testBitwiseFlag(2);
}
/**The query sequence itself is unmapped*/
public boolean isUnmapped(){
return testBitwiseFlag(4);
}
/**The mate is unmapped*/
public boolean isMateUnMapped(){
return testBitwiseFlag(8);
}
/**Strand of the query (false for forward; true for reverse strand)*/
public boolean isReverseStrand(){
return testBitwiseFlag(16);
}
/**Strand of the mate*/
public boolean isMateReverseStrand(){
return testBitwiseFlag(32);
}
/**The read is the first read in a pair*/
public boolean isFirstPair(){
return testBitwiseFlag(64);
}
/**The read is the second read in a pair*/
public boolean isSecondPair(){
return testBitwiseFlag(128);
}
/**The alignment is not primary (a read having split hits may have multiple primary alignment records)*/
public boolean isNotAPrimaryAlignment(){
return testBitwiseFlag(265);
}
/**The read fails platform/vendor quality checks*/
public boolean failedQC(){
return testBitwiseFlag(512);
}
/**The read is either a PCR duplicate or an optical duplicate*/
public boolean isADuplicate(){
return testBitwiseFlag(1024);
}
</pre>
Nix is offline   Reply With Quote
Old 02-06-2010, 06:04 PM   #27
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by Nix View Post
OK, 4hrs later and I can parse the strand.....

Here's some java code so folks don't have to go through this again:

<pre>
/**
* Index Index^2 MethodName Flag DescriptionFromSamSpec
* 0 1 isReadPartOfAPairedAlignment 0x0001 the read is paired in sequencing, no matter whether it is mapped in a pair
* 1 2 isReadAProperPairedAlignment 0x0002 the read is mapped in a proper pair (depends on the protocol, normally inferred during alignment) 1
* 2 4 isQueryUnmapped 0x0004 the query sequence itself is unmapped
* 3 8 isMateUnMapped 0x0008 the mate is unmapped
* 4 16 isQueryReverseStrand 0x0010 strand of the query (false for forward; true for reverse strand)
* 5 32 isMateReverseStrand 0x0020 strand of the mate
* 6 64 isReadFirstPair 0x0040 the read is the first read in a pair
* 7 128 isReadSecondPair 0x0080 the read is the second read in a pair
* 8 256 isAlignmentNotPrimary 0x0100 the alignment is not primary (a read having split hits may have multiple primary alignment records)
* 9 512 doesReadFailVendorQC 0x0200 the read fails platform/vendor quality checks
* 10 1024 isReadADuplicate 0x0400 the read is either a PCR duplicate or an optical duplicate
*/
public boolean testBitwiseFlag(int testValue){
return ((flags & testValue) == testValue);
}
/**The read is paired in sequencing, no matter whether it is mapped in a pair*/
public boolean isPartOfAPairedAlignment(){
return testBitwiseFlag(1);
}
/**The read is mapped in a proper pair (depends on the protocol, normally inferred during alignment)*/
public boolean isAProperPairedAlignment(){
return testBitwiseFlag(2);
}
/**The query sequence itself is unmapped*/
public boolean isUnmapped(){
return testBitwiseFlag(4);
}
/**The mate is unmapped*/
public boolean isMateUnMapped(){
return testBitwiseFlag(8);
}
/**Strand of the query (false for forward; true for reverse strand)*/
public boolean isReverseStrand(){
return testBitwiseFlag(16);
}
/**Strand of the mate*/
public boolean isMateReverseStrand(){
return testBitwiseFlag(32);
}
/**The read is the first read in a pair*/
public boolean isFirstPair(){
return testBitwiseFlag(64);
}
/**The read is the second read in a pair*/
public boolean isSecondPair(){
return testBitwiseFlag(128);
}
/**The alignment is not primary (a read having split hits may have multiple primary alignment records)*/
public boolean isNotAPrimaryAlignment(){
return testBitwiseFlag(265);
}
/**The read fails platform/vendor quality checks*/
public boolean failedQC(){
return testBitwiseFlag(512);
}
/**The read is either a PCR duplicate or an optical duplicate*/
public boolean isADuplicate(){
return testBitwiseFlag(1024);
}
</pre>
Have you tried picard for a Java API? Here's the link to the javadoc.
nilshomer is offline   Reply With Quote
Old 02-07-2010, 06:24 AM   #28
Nix
Member
 
Location: SLC, Utah

Join Date: Jun 2008
Posts: 60
Default

Yes I did try Picard's SAMFileReader but it has a major memory management problem when trying to parse RNA-Seq data that contains alignments to chromosomes and splice junctions. These alignment files contain headers with 2.3 million splice junction 'chromosomes' that cause the SAMFileReader to quickly throw an out of memory error. The SAM format requires one '@SQ' line for each splice junction. Ouch.

Thus, I had to create a light weight file parser that would skip such header info. I also wanted to get a feel for whether sam could be used as a universal short read alignment format. I'm part of two projects (GenoViz and a $100M NHLBI cardiac project) that are looking to see if SAM could serve that function.

I should say that now I understand bitwise flags, they are a pretty clever trick for compressing a bunch of boolean flags in a binary file. For SAM spec 2 though, they should be removed from the text format.
Nix is offline   Reply With Quote
Old 02-17-2010, 01:16 AM   #29
aurelielaugraud
Member
 
Location: lyon

Join Date: Aug 2009
Posts: 37
Default

Hello, still on the flag meaning ...
I am trying to understand those (by bwa):

HWI-EAS255_8282_FC30G79_PE:4:1:1:1979 69 * 0 0 * * 0 0 TAGATGGCAAAATGAACTTCCATAAANNNNNCTGTATGCTCTGGGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN BBBBB*2??9<BA@@BBB;BBB8@=;'''''<BCCB6+?-B??;3''''''''''''''''''''''''''''''
HWI-EAS255_8282_FC30G79_PE:4:1:1:1979 133 * 0 0 * * 0 0 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
^C

HWI-EAS255_8282_FC30G79_PE:4:1:1:1423 69 * 0 0 * * 0 0 TACACCAAATATATATTATATGCTGTACATAATATATTAAAGTCGACCAAATATATATTATATACTGTACATAAA BBBBBBBBBBBBBBBBBBBBBBCBBBB?BBBBBBBBBCBBBBBBBBBB9;ABBBBBBBB=B=>>>BBBB?;<BB=
HWI-EAS255_8282_FC30G79_PE:4:1:1:1423 133 * 0 0 * * 0 0 TATACTTTGGGTATTTTGATATTTTATTTACAGTATATAATACATACTTTGGGTACTTCGATATTTTATGTGCAG BBBBCBCBBBBBBBCCBCBBBBCCBBBBBBBBBBBBCCBBCBBBBBBBBBBBBBBBB<*<BBBBBBBBB8BBBBB

HWI-EAS255_8282_FC30G79_PE:4:1:1:1908 69 * 0 0 * * 0 0 ATATTTTCGTTGGAAAAAAGCTCTTGTACTACAGAACAGATCCAGTTTTACGGATACGTGGCTTGAGAGGTAGAT B909B-6<'.?A;A>;6><24BBCBBB@.?*4+?+<B=22=2?42+*9?-''326B'''')3)6,'66'.3.72?
HWI-EAS255_8282_FC30G79_PE:4:1:1:1908 133 * 0 0 * * 0 0 AAGGGATCCATCCGACGTAGCGTCCGAAACAGTTAAGATGAGAGCACTTAGAAAACTCATCCAGCGAAGGATACA B92<?+<446'.9B;BB<4?7*4,1'?B6'39'.',<6),94;6,'=9'96=BBB.')3-<@+3),>49',B2*9

as I understand the flag meaning for 69, the query is unmapped, but the mate is mapped ... ????????

Moreover, I get some lines with MAPQ to 0 and flag to 99/147 with coherent insert size ... what could it be ??

Many Thanks

Aurélie
aurelielaugraud is offline   Reply With Quote
Old 02-17-2010, 02:25 AM   #30
aurelielaugraud
Member
 
Location: lyon

Join Date: Aug 2009
Posts: 37
Default

Hello again,
I have another question about this :
HWI-EAS255_8282_FC30G79_PE:4:1:402:627 97 chr4 49179087 0 75M chr21 9663639 0 GAACTGGATTCAAGCCAGTTAACTACTGGATCAAATCTGATCCTGGGCCCTGTCCCGTTTCTGTCATAACTTCTA BBB?BB>6@B<;<BBBBBB?BCBB6?BB=BB;@<B?BBBB0B6BB=BBBCBB8BBBBB;CBBB;BBB=8;BB?C? XT:A:R NM:i:2 SM:i:0 AM:i:0 X0:i:4 X1:i:7 XM:i:2 XO:i:0 XG:i:0 MD:Z:46A3G24
HWI-EAS255_8282_FC30G79_PE:4:1:402:627 145 chr21 9663639 37 75M chr4 49179087 0 TGGAACAGAAAATTGCTCAAGGAAACTCAGAGCTCAAAACACAAATCCATGGAGCTATGAACTCCGAGAGAGAAT @BBB?BB8BBBBBBBBB;;;CCBB?@@AB;B<BBBBBBBBBB8B@9ABBBBBBB=??BB;6'3BBBB;0=BBBBB XT:A:U NM:i:2 SM:i:37 AM:i:0 X0:i:1 X1:i:0 XM:i:2 XO:i:0 XG:i:0 MD:Z:20A40A13


how can one pair have a MAPQ at 37 and the other at 0 ..

and I have different flag value for other pairs mapped on different chr.
HWI-EAS255_8282_FC30G79_PE:4:1:24:1302 65 chr12 82786110 0 75M chr4 22777722 0 CACCTATGGGTGAGAATATGTGGTGTTTGGTTTTTTGTTCTTGTGATAGTTTACTGAGAATGATGATTTCCAATT BBBBBBBBBBBBB=BBA>ABBBBBBBBBBBBBCBBBBBBBCBBBBBBBBBBBBBB?+?=BBBBBBBBBBBBBBBB XT:A:R NM:i:1 SM:i:0 AM:i:0 X0:i:71 XM:i:1 XO:i:0 XG:i:0 MD:Z:8A66
HWI-EAS255_8282_FC30G79_PE:4:1:24:1302 129 chr4 22777722 0 75M chr12 82786110 0 TCCAAAAATGATAGACTGGATTAAGAAACTGTGGCACATATACACCATGGAATACTATGCAGCCATAAAAAATGA BCBBBBBBBBBBBBBBBBBBCCCBBBBBBBBBBBCCBBCCBCBBBBBBB?BBBBBCCBBBBBBBB;?2?BBBBBB XT:A:R NM:i:1 SM:i:0 AM:i:0 X0:i:55 XM:i:1 XO:i:0 XG:i:0 MD:Z:28A46

thanks
Aurélie
aurelielaugraud is offline   Reply With Quote
Old 02-17-2010, 07:03 AM   #31
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

Quote:
as I understand the flag meaning for 69, the query is unmapped, but the mate is mapped
This seems to be an unfixed bug.

Quote:
Moreover, I get some lines with MAPQ to 0 and flag to 99/147 with coherent insert size
I am not sure why you think this is problematic.

Quote:
how can one pair have a MAPQ at 37 and the other at 0
I have answered this question several times in the mailing list. I now put it in FAQ on the bwa homepage.
lh3 is offline   Reply With Quote
Old 02-22-2010, 04:27 AM   #32
aurelielaugraud
Member
 
Location: lyon

Join Date: Aug 2009
Posts: 37
Default

thanks a lot for the replies for the replies
aurelielaugraud is offline   Reply With Quote
Old 02-22-2010, 05:25 AM   #33
aurelielaugraud
Member
 
Location: lyon

Join Date: Aug 2009
Posts: 37
Default

would you please explain what xt:a:n and xt:a:M means ?
(I haven't found any xt:a:n yet but some xt:a:m ...
I understand the xt:a:r and xt:a:u
aurelielaugraud is offline   Reply With Quote
Old 04-16-2010, 09:40 AM   #34
kjngo
Junior Member
 
Location: Davis, California

Join Date: Dec 2009
Posts: 6
Default

Quote:
Originally Posted by bioinfosm View Post
what would a flag 0 imply in sam output? I used novo align and the only flags I see are 0 4 and 16. 4 is unmapped read. 16 is for the strand, but how to interpret 0.

Also, How can I ascertain the reads that are *not* uniquely mapped. I read that the 5th column MAPQ should be of help to determine multiply-mapped reads. Is MAPQ=0 an indication that the read is multiply-mapped?

Thanks
I have a similar question. All I see are 0, 4, 16, and 20s. I do not understand how to interpret 20. I know the hexadecimal would be 14, which should mean this is a combination of both strand and not mapped. Please correct me if i'm wrong. I see a MAPQ score and also reference hit, so does it still mean it mapped? Thanks
kjngo is offline   Reply With Quote
Old 04-18-2010, 06:36 AM   #35
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

Unmapped reads can have strand. This only means the unmapped sequence is given on its reverse strand.
lh3 is offline   Reply With Quote
Old 04-19-2010, 11:20 AM   #36
kjngo
Junior Member
 
Location: Davis, California

Join Date: Dec 2009
Posts: 6
Default

Thank you Heng
kjngo is offline   Reply With Quote
Old 05-07-2010, 04:04 AM   #37
shriram
Member
 
Location: UK

Join Date: May 2010
Posts: 13
Default

Hello new to the group
shriram is offline   Reply With Quote
Old 05-07-2010, 04:41 AM   #38
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,541
Default

Quote:
Originally Posted by Nix View Post
I should say that now I understand bitwise flags, they are a pretty clever trick for compressing a bunch of boolean flags in a binary file. For SAM spec 2 though, they should be removed from the text format.
I'd go further and say the flag (and other things) will need redoing to cope with more than just paired reads - it will need to cope with N-tuples of reads each separated by an insert of some estimated size (e.g. Strobe Reads from Pacific Biosciences, or what Helicos calls dark fill).
maubp is offline   Reply With Quote
Old 07-16-2010, 12:51 AM   #39
argyjbao
Junior Member
 
Location: hk

Join Date: Sep 2009
Posts: 3
Default sam FLAG

hi guys,
I found the SAM FLAG encoding method is very clever for storing the alignment information. But I also found that the the negative sign for the insert field in the following pair-end example:
The manual said the negative sign of insert fileld means the mapping position is smaller than the current one. But the fact is the reverse.
And also, in the following pair-end, the mapping position fileds are equal for the pair-end reads (2005683). But they are not equal just having overlap.

Any buddy can help me? Thanks in advance?

GRC076_1_35_8988_3804/GRC076_1_35_8988_3804 pPr1 NT_004350 2005683 255 76M = 2005683 101 TCCGGGTGGGGGCAGGGGCCCTGGAGGGGTCACTCGGCTGCCGTCTGTCACTTGGGTCCAGAGGAGCTTCTGGTGG CCCCCCCCCBBBBCCCCCCCCCCCCCCCB>CCCCCCCCCCCCDDBDCBDACDC>@B>BBBBBB@=BB>ABBB@?BC
GRC076_1_35_8988_3804/GRC076_1_35_8988_3804 pP2 NT_004350 2005683 255 76M = 2005683 -101 GTGGCCTCGGGAGCAAGGGTCAGACCCACCAGAAGCTCCTCTGGACCCAAGTGACAGACGGCAGCCGAGTGACCCC BCD@BB7@?<A<=AABBBCBBCCCCCCCCCCCC@CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC

Quote:
Originally Posted by lh3 View Post
@yxi

Please use "samtools view -X" to see a human readable FLAG. I agree that not specifying a better FLAG field initially is a shortcoming, but it is too late to change the spec at the moment. samtools view -X comes as a temporary hack which I find useful.

Could you suggest a better format for the aux fields or to make SAM simpler? Note that SAM should be both human readable and machine readable. The current form is the best we can come to so far. Genbank/EMBL files are human readable, but they cause a lot of troubles in parsing, and we do not want to go in that way again. I think the best solution to human readability is not to change the spec, but to write a script to print a SAM alignment in multiple lines in a beautiful way. If you want to contribute to such a script, that would be great. Thanks.
argyjbao is offline   Reply With Quote
Old 10-07-2010, 05:13 AM   #40
northbio
Junior Member
 
Location: china

Join Date: Sep 2010
Posts: 4
Default

hello everyone, the information above help me further understand the flag in the SAM format. But I still have problems in fully understanding the flag, like:
0x0002 the read is mapped in a proper pair
0x0004 the query sequence itself is unmapped
0x0008 the mate is unmapped
I don't know what is the meaning of "a proper pair" and the difference between "pair" and "mate", could anyone help me explaining them ?
And another question is I used tophat to deal with my PairEnd Illumina Seq, but the SAM file produced by tophat is like below:
FC30W3GAAXX:7:53:723:1789#0 73 1 487961 3 62M1849N13M * 0 0 ATCAGCTTCATTCCCTCAACAGTGTTCTTC
TTCAACGGGCAGCACATGAAGGTCGACTATGGATCTCCAGATCAC 84AB:B@:=A=-9BB?BB>@7>A@ABBBBB=;@B:BABBB?B>@AC@@AAB=CCA6@>?ABB>9@@ACCCA@C@B NM:i:7
XS:A:+ NH:i:2
FC30W3GAAXX:7:89:981:2025#0 137 1 487982 3 41M1849N34M * 0 0 GTGTTCTTCTTCAACGGGCAGCACATGAAG
GTCGACTATGGATCTCCAGATCACACCAAGTTTGTGGGAAGCTTC 8:;?6886<=:><6>8<=?>A:7=;8A?@@:BAA=A@@A@A@AAB@?@@@@B;B@AABA@BBABCCCBB@CBA;< NM:i:4
XS:A:+ NH:i:2
I want to know whether colomn 7-9(* 0 0) indicate my data were not considered as PE?
northbio is offline   Reply With Quote
Reply

Tags
format, quality, sam

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:21 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO