Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BWA sampe mapping result, what is "PROPER PAIR"?

    I have been using Bwa to map paired end reads(illumina) recently and I thought i could use some of your help to get answers for some questions that I have.

    1. What does it exactly mean by the term "proper pair" in bwa? Does bwa consider orientation of mapped pairs? r we talking only case i below? or just based on the insert size
    case i) -----> <------
    case ii) -----> ------->
    case iii) <------ <------
    case iv) <------ -------->

    2. i have a whole bunch of mapping result that looks quite odd to me.
    They look like they are paired by bwa sampe but obviously they have different read names so can't be a pair..
    I321_1_FC30VWBAAXX:7:100:1611:994 83 gi|150002608|ref|NC_009614.1| 1915637 29 75M = 1859241 -56471 TGGCAAATTCCAATTGGGGCTTTTCAATGAATGTTTTTACTTTAAAGAATTCTACTTGTTTTTCTTCCTCAATCT AAALIENKEHOLOKJD>=HXOIUNbda_Sh`hLhah]h]haZhhhhShhhhhhhhhhhhhhhhhhhhhhhhhhhh XT:A:U NM:i:1 SM:i:29 AM:i:29 X0:i:1 X1:i:0 XM:i:1 XO:i:0 XG:i:0 MD:Z:53C21

    I321_1_FC30VWBAAXX:7:100:1615:784
    163 gi|150002608|ref|NC_009614.1| 1859241 29 20M55S = 1915637 56471 GCTTGTTGTGACTTTGATAGATTGTGACGTGTACGAAAATATGCAAGAGGCGGGGATTGATTCGTCTAGCCCGTT hhhhhhhhhhhehhhhhhhhhhhhSheMhRh`hNhWXhWhehhP][\Sh^Ehh^hNW^[PK_<NJJNGN>AGRCH XT:A:M NM:i:2 SM:i:29 AM:i:29 XM:i:2 XO:i:0 XG:i:0 MD:Z:5C9T4


    3. When I grep for sequence "I321_1_FC30VWBAAXX:7:100:1611:994"

    I get :

    I321_1_FC30VWBAAXX:7:100:1611:994 129 gi|150002608|ref|NC_009614.1| 1915576 37 75M = 3763132 1847556 TTGATATTCCATAAGAATATTCCTGAGTTCCAATAGAATTCTCCACTTTCTACGAATACTTTGGCAAATTCCAAT hhhhhhhhhhhhhhhhhhhhhedhhhhhh`hh]hhhcZhcXhZR`ThhhPhhU]RQ`YJSX^PPLNPMWSLCIHU XT:A:U NM:i:0 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:75

    I321_1_FC30VWBAAXX:7:100:1611:994 83 gi|150002608|ref|NC_009614.1| 1915637 29 75M = 1859241 -56471 TGGCAAATTCCAATTGGGGCTTTTCAATGAATGTTTTTACTTTAAAGAATTCTACTTGTTTTTCTTCCTCAATCT AAALIENKEHOLOKJD>=HXOIUNbda_Sh`hLhah]h]haZhhhhShhhhhhhhhhhhhhhhhhhhhhhhhhhh XT:A:U NM:i:1 SM:i:29 AM:i:29 X0:i:1 X1:i:0 XM:i:1 XO:i:0 XG:i:0 MD:Z:53C21


    where the flag of read1(129 --> 0b10000001) is telling me that read1 is mapped in a proper pair and the flag of read2(83 --> 0b1010011) is also telling it's mapped in a proper pair. but they are not paired ( i can tell by the difference size of inferred insert size)


    Can anyone help answering these questions? I am trying to filter pairs are mapped properly. I tried using samtools view with -f 2 option (since 2 is 0x02 bit for proper mapping) but i have so many pairs similar to what i described up there.

    Anyone? Thanks!
    Last edited by hl450; 07-29-2010, 08:03 AM.

  • #2
    Anyone?????? plz

    Comment


    • #3
      I have the same question...

      Comment


      • #4
        Could anyone experienced in alignment via BWA please answer? Thank you!!

        Comment


        • #5
          How can they look like they are paired if they have different read names? That doesn't make sense at all. Are you sure that your input FASTQ files are properly lined up in the correct order? I believe many of these tools read one record at a time from each of the input files, and assume that corresponding records are part of a pair -- it don't necessarily check the read name.

          That being said, I have observed that occasionally I get outputs with the "properly paired" flag set, even when the alignments are to different chromosomes, which is weird, so I think there could be some bugs.

          Comment


          • #6
            I found what was causing the problem... It was due to the data I downloaded off EBI short read archive. Read 1 and Read 2 fastq files had uneven number of paired reads. Shouldn't short read archive check this upon submission of data? arg...

            Originally posted by zlu View Post
            My problem was actually due to the uneven number of pair reads in the input fastq files. I was doing some quality filterings, mainly artefacts removal, on read1 and read2 separately and this resulted in the 2 files having different number of reads.
            Originally posted by lh3 View Post
            No. You must make sure the two files contain the same set of pairs with identical order in each file. Your input will fail all aligners to date, so far as I know.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin


              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
              Yesterday, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            39 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            41 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            35 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            55 views
            0 likes
            Last Post seqadmin  
            Working...
            X