Seqanswers Leaderboard Ad

**nilshomer** · 11-09-2009, 10:49 AM

Originally posted by Hena View Post

Hi,

I've done some testing with BWA against couple of small chromosomes. My test data is paired end data. I don't understand couple lines of my results. I made a small perl script to understand the flag field on the sam output.

Code:

#!/usr/bin/perl

use warnings;
use strict;

my $dec = shift @ARGV;
for (0 .. 11) {
  if ( $dec & 2**$_ ) {
    print $_+1," bit on (",2**$_,")\n"
  }
}
exit;

That allows me to know what flags are on. I also added textual output on meaning of the fields (but it grows the code so left out here). Anyway I got following result rows in my test (cut the sequence and rest of columns out)

Code:

HWI-EAS418:1:1:0:1819#0 89      chr20   29366674        0       50M     =       29366674        0
HWI-EAS418:1:1:0:1819#0 181     chr20   29366674        0       *       =       29366674        0

Now running 89 through my script gives out that 1st, 4th, 5th and 7th bits are on. So it's 1st read of pair, on reverse strand with mate unmapped. Second row has 181 so 1st, 3rd, 5th, 6th and 7th bits are on. Now this means 2nd of a pair, strand of mate and query unmapped.

My questien then. Does this mean that second pair didn't map anything and values given are actually first read of the pair?

Get the latest samtools source and use the "samtools view -X" option and it will decode the flag field into a string representation.

**Hena** · 11-10-2009, 02:58 AM

Originally posted by nilshomer View Post

Get the latest samtools source and use the "samtools view -X" option and it will decode the flag field into a string representation.

Umm ... is there is guide that would tell me what that string representation means? As the result of that was:

Code:

HWI-EAS418:1:1:0:1819#0 pUr1    chr20   29366674        0       50M     =       29366674        0
HWI-EAS418:1:1:0:1819#0 purR2   chr20   29366674        0       *       =       29366674        0

pUr1 and purR2 doesn't really tell me much more.

Edit: Ok you can see the explanation with 'samtools view -?'.

p=0x1 (paired), P=0x2 (properly paired),
u=0x4 (unmapped), U=0x8 (mate unmapped),
r=0x10 (reverse), R=0x20 (mate reverse)
1=0x40 (first), 2=0x80 (second)

So first is paired, mate unmapped, reverse and 1st in pair. Second is paired, unmapped, reverse, mate reverse and 2nd in pair.

Then should I assume that the positions for second are not real, but just copies of the first position? If so why is the positions then written to second in the first place?

**nilshomer** · 11-10-2009, 07:34 PM

Originally posted by Hena View Post

Umm ... is there is guide that would tell me what that string representation means? As the result of that was:

Code:

HWI-EAS418:1:1:0:1819#0 pUr1    chr20   29366674        0       50M     =       29366674        0
HWI-EAS418:1:1:0:1819#0 purR2   chr20   29366674        0       *       =       29366674        0

pUr1 and purR2 doesn't really tell me much more.

Edit: Ok you can see the explanation with 'samtools view -?'.

p=0x1 (paired), P=0x2 (properly paired),
u=0x4 (unmapped), U=0x8 (mate unmapped),
r=0x10 (reverse), R=0x20 (mate reverse)
1=0x40 (first), 2=0x80 (second)

So first is paired, mate unmapped, reverse and 1st in pair. Second is paired, unmapped, reverse, mate reverse and 2nd in pair.

Then should I assume that the positions for second are not real, but just copies of the first position? If so why is the positions then written to second in the first place?

This is for fast retrieval of the the unmapped mate of a mapped read. Consider if you want to realign the unmapped read using the mate end read for finding breakpoints etc. It is a common convention among people who use the SAM format.

**Hena** · 11-11-2009, 11:36 AM

So then it essentially means in my example that the second read of the pair is not mapped and it's values are not to be used for that read (though could be used for the 1st read).

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 55 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 52 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 45 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Understanding BWA result

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News