Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Understanding BWA result

    Hi,

    I've done some testing with BWA against couple of small chromosomes. My test data is paired end data. I don't understand couple lines of my results. I made a small perl script to understand the flag field on the sam output.

    Code:
    #!/usr/bin/perl
    
    use warnings;
    use strict;
    
    my $dec = shift @ARGV;
    for (0 .. 11) {
      if ( $dec & 2**$_ ) {
        print $_+1," bit on (",2**$_,")\n"
      }
    }
    exit;
    That allows me to know what flags are on. I also added textual output on meaning of the fields (but it grows the code so left out here). Anyway I got following result rows in my test (cut the sequence and rest of columns out)

    Code:
    HWI-EAS418:1:1:0:1819#0 89      chr20   29366674        0       50M     =       29366674        0
    HWI-EAS418:1:1:0:1819#0 181     chr20   29366674        0       *       =       29366674        0
    Now running 89 through my script gives out that 1st, 4th, 5th and 7th bits are on. So it's 1st read of pair, on reverse strand with mate unmapped. Second row has 181 so 1st, 3rd, 5th, 6th and 7th bits are on. Now this means 2nd of a pair, strand of mate and query unmapped.

    My questien then. Does this mean that second pair didn't map anything and values given are actually first read of the pair?

  • #2
    Originally posted by Hena View Post
    Hi,

    I've done some testing with BWA against couple of small chromosomes. My test data is paired end data. I don't understand couple lines of my results. I made a small perl script to understand the flag field on the sam output.

    Code:
    #!/usr/bin/perl
    
    use warnings;
    use strict;
    
    my $dec = shift @ARGV;
    for (0 .. 11) {
      if ( $dec & 2**$_ ) {
        print $_+1," bit on (",2**$_,")\n"
      }
    }
    exit;
    That allows me to know what flags are on. I also added textual output on meaning of the fields (but it grows the code so left out here). Anyway I got following result rows in my test (cut the sequence and rest of columns out)

    Code:
    HWI-EAS418:1:1:0:1819#0 89      chr20   29366674        0       50M     =       29366674        0
    HWI-EAS418:1:1:0:1819#0 181     chr20   29366674        0       *       =       29366674        0
    Now running 89 through my script gives out that 1st, 4th, 5th and 7th bits are on. So it's 1st read of pair, on reverse strand with mate unmapped. Second row has 181 so 1st, 3rd, 5th, 6th and 7th bits are on. Now this means 2nd of a pair, strand of mate and query unmapped.

    My questien then. Does this mean that second pair didn't map anything and values given are actually first read of the pair?
    Get the latest samtools source and use the "samtools view -X" option and it will decode the flag field into a string representation.

    Comment


    • #3
      Originally posted by nilshomer View Post
      Get the latest samtools source and use the "samtools view -X" option and it will decode the flag field into a string representation.
      Umm ... is there is guide that would tell me what that string representation means? As the result of that was:
      Code:
      HWI-EAS418:1:1:0:1819#0 pUr1    chr20   29366674        0       50M     =       29366674        0
      HWI-EAS418:1:1:0:1819#0 purR2   chr20   29366674        0       *       =       29366674        0
      pUr1 and purR2 doesn't really tell me much more.

      Edit: Ok you can see the explanation with 'samtools view -?'.

      p=0x1 (paired), P=0x2 (properly paired),
      u=0x4 (unmapped), U=0x8 (mate unmapped),
      r=0x10 (reverse), R=0x20 (mate reverse)
      1=0x40 (first), 2=0x80 (second)

      So first is paired, mate unmapped, reverse and 1st in pair. Second is paired, unmapped, reverse, mate reverse and 2nd in pair.

      Then should I assume that the positions for second are not real, but just copies of the first position? If so why is the positions then written to second in the first place?
      Last edited by Hena; 11-10-2009, 03:16 AM.

      Comment


      • #4
        Originally posted by Hena View Post
        Umm ... is there is guide that would tell me what that string representation means? As the result of that was:
        Code:
        HWI-EAS418:1:1:0:1819#0 pUr1    chr20   29366674        0       50M     =       29366674        0
        HWI-EAS418:1:1:0:1819#0 purR2   chr20   29366674        0       *       =       29366674        0
        pUr1 and purR2 doesn't really tell me much more.

        Edit: Ok you can see the explanation with 'samtools view -?'.

        p=0x1 (paired), P=0x2 (properly paired),
        u=0x4 (unmapped), U=0x8 (mate unmapped),
        r=0x10 (reverse), R=0x20 (mate reverse)
        1=0x40 (first), 2=0x80 (second)

        So first is paired, mate unmapped, reverse and 1st in pair. Second is paired, unmapped, reverse, mate reverse and 2nd in pair.

        Then should I assume that the positions for second are not real, but just copies of the first position? If so why is the positions then written to second in the first place?
        This is for fast retrieval of the the unmapped mate of a mapped read. Consider if you want to realign the unmapped read using the mate end read for finding breakpoints etc. It is a common convention among people who use the SAM format.

        Comment


        • #5
          So then it essentially means in my example that the second read of the pair is not mapped and it's values are not to be used for that read (though could be used for the 1st read).

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM
          • seqadmin
            Techniques and Challenges in Conservation Genomics
            by seqadmin



            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

            Avian Conservation
            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
            03-08-2024, 10:41 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Today, 06:37 PM
          0 responses
          7 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Today, 06:07 PM
          0 responses
          7 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-22-2024, 10:03 AM
          0 responses
          49 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-21-2024, 07:32 AM
          0 responses
          66 views
          0 likes
          Last Post seqadmin  
          Working...
          X