SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Understanding Expression Analysis jawalsh2 Bioinformatics 0 01-27-2012 09:02 AM
my understanding for cuffdiff output Huijuan Bioinformatics 1 05-01-2011 04:42 AM
help in understanding vinumanikandan General 3 02-09-2011 11:56 PM
BWA sampe mapping result, what is "PROPER PAIR"? hl450 Bioinformatics 5 08-06-2010 01:26 PM
Help for bwa result baby1885 Illumina/Solexa 5 06-05-2010 05:22 PM

Reply
 
Thread Tools
Old 11-09-2009, 04:55 AM   #1
Hena
Member
 
Location: Finland

Join Date: Nov 2009
Posts: 19
Default Understanding BWA result

Hi,

I've done some testing with BWA against couple of small chromosomes. My test data is paired end data. I don't understand couple lines of my results. I made a small perl script to understand the flag field on the sam output.

Code:
#!/usr/bin/perl

use warnings;
use strict;

my $dec = shift @ARGV;
for (0 .. 11) {
  if ( $dec & 2**$_ ) {
    print $_+1," bit on (",2**$_,")\n"
  }
}
exit;
That allows me to know what flags are on. I also added textual output on meaning of the fields (but it grows the code so left out here). Anyway I got following result rows in my test (cut the sequence and rest of columns out)

Code:
HWI-EAS418:1:1:0:1819#0 89      chr20   29366674        0       50M     =       29366674        0
HWI-EAS418:1:1:0:1819#0 181     chr20   29366674        0       *       =       29366674        0
Now running 89 through my script gives out that 1st, 4th, 5th and 7th bits are on. So it's 1st read of pair, on reverse strand with mate unmapped. Second row has 181 so 1st, 3rd, 5th, 6th and 7th bits are on. Now this means 2nd of a pair, strand of mate and query unmapped.

My questien then. Does this mean that second pair didn't map anything and values given are actually first read of the pair?
Hena is offline   Reply With Quote
Old 11-09-2009, 09:49 AM   #2
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by Hena View Post
Hi,

I've done some testing with BWA against couple of small chromosomes. My test data is paired end data. I don't understand couple lines of my results. I made a small perl script to understand the flag field on the sam output.

Code:
#!/usr/bin/perl

use warnings;
use strict;

my $dec = shift @ARGV;
for (0 .. 11) {
  if ( $dec & 2**$_ ) {
    print $_+1," bit on (",2**$_,")\n"
  }
}
exit;
That allows me to know what flags are on. I also added textual output on meaning of the fields (but it grows the code so left out here). Anyway I got following result rows in my test (cut the sequence and rest of columns out)

Code:
HWI-EAS418:1:1:0:1819#0 89      chr20   29366674        0       50M     =       29366674        0
HWI-EAS418:1:1:0:1819#0 181     chr20   29366674        0       *       =       29366674        0
Now running 89 through my script gives out that 1st, 4th, 5th and 7th bits are on. So it's 1st read of pair, on reverse strand with mate unmapped. Second row has 181 so 1st, 3rd, 5th, 6th and 7th bits are on. Now this means 2nd of a pair, strand of mate and query unmapped.

My questien then. Does this mean that second pair didn't map anything and values given are actually first read of the pair?
Get the latest samtools source and use the "samtools view -X" option and it will decode the flag field into a string representation.
nilshomer is offline   Reply With Quote
Old 11-10-2009, 01:58 AM   #3
Hena
Member
 
Location: Finland

Join Date: Nov 2009
Posts: 19
Default

Quote:
Originally Posted by nilshomer View Post
Get the latest samtools source and use the "samtools view -X" option and it will decode the flag field into a string representation.
Umm ... is there is guide that would tell me what that string representation means? As the result of that was:
Code:
HWI-EAS418:1:1:0:1819#0 pUr1    chr20   29366674        0       50M     =       29366674        0
HWI-EAS418:1:1:0:1819#0 purR2   chr20   29366674        0       *       =       29366674        0
pUr1 and purR2 doesn't really tell me much more.

Edit: Ok you can see the explanation with 'samtools view -?'.

p=0x1 (paired), P=0x2 (properly paired),
u=0x4 (unmapped), U=0x8 (mate unmapped),
r=0x10 (reverse), R=0x20 (mate reverse)
1=0x40 (first), 2=0x80 (second)

So first is paired, mate unmapped, reverse and 1st in pair. Second is paired, unmapped, reverse, mate reverse and 2nd in pair.

Then should I assume that the positions for second are not real, but just copies of the first position? If so why is the positions then written to second in the first place?

Last edited by Hena; 11-10-2009 at 02:16 AM.
Hena is offline   Reply With Quote
Old 11-10-2009, 06:34 PM   #4
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by Hena View Post
Umm ... is there is guide that would tell me what that string representation means? As the result of that was:
Code:
HWI-EAS418:1:1:0:1819#0 pUr1    chr20   29366674        0       50M     =       29366674        0
HWI-EAS418:1:1:0:1819#0 purR2   chr20   29366674        0       *       =       29366674        0
pUr1 and purR2 doesn't really tell me much more.

Edit: Ok you can see the explanation with 'samtools view -?'.

p=0x1 (paired), P=0x2 (properly paired),
u=0x4 (unmapped), U=0x8 (mate unmapped),
r=0x10 (reverse), R=0x20 (mate reverse)
1=0x40 (first), 2=0x80 (second)

So first is paired, mate unmapped, reverse and 1st in pair. Second is paired, unmapped, reverse, mate reverse and 2nd in pair.

Then should I assume that the positions for second are not real, but just copies of the first position? If so why is the positions then written to second in the first place?
This is for fast retrieval of the the unmapped mate of a mapped read. Consider if you want to realign the unmapped read using the mate end read for finding breakpoints etc. It is a common convention among people who use the SAM format.
nilshomer is offline   Reply With Quote
Old 11-11-2009, 10:36 AM   #5
Hena
Member
 
Location: Finland

Join Date: Nov 2009
Posts: 19
Default

So then it essentially means in my example that the second read of the pair is not mapped and it's values are not to be used for that read (though could be used for the 1st read).
Hena is offline   Reply With Quote
Reply

Tags
bwa, sam

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:47 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO