Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bwa mem don't set PROPER_PAIR flag in pair end align

    $ cat r1.fq
    @1
    GGGCTAAGAGACAGCAAATACACATGAACAGAAAGAAGAGGGCAAAGAAAAGGCTGACGG
    +
    CCCFFFFFHHFHHJIJIJIJJJJJJJJDIJJJJJIJJIJJIIGIJJJJHIHJJJJIHIHI

    $ cat r2.fq
    @1
    TAGTTTAAACGAGATTGCCAGGACCGGGTATCATTCACCATTTTTCTTTTCGTTAACTTG
    +
    CCCFFFFFHHHHHJJJIIIHIIJGGIGIJEIJIFGGGIJJJGFIIIJIFFEF?CFFEDCB

    $ bwa mem ../ref/ref.fa r1.fq r2.fq > mem.sam;cat mem.sam
    @SQ SN:1 LN:100000
    @PG ID:bwa PN:bwa VN:0.7.10-r1017-dirty CL:bwa mem ../ref/ref.fa r1.fq r2.fq
    1 81 1 1741 60 60M = 1681 -120 CCGTCAGCCTTTTCTTTGCCCTCTTCTTTCTGTTCATGTGTATTTGCTGTCTCTTAGCCC IHIHIJJJJHIHJJJJIGIIJJIJJIJJJJJIDJJJJJJJJIJIJIJHHFHHFFFFFCCC NM:i:1 MD:Z:18A41 AS:i:55 XS:i:0
    1 161 1 1681 60 60M = 1741 120 TAGTTTAAACGAGATTGCCAGGACCGGGTATCATTCACCATTTTTCTTTTCGTTAACTTG CCCFFFFFHHHHHJJJIIIHIIJGGIGIJEIJIFGGGIJJJGFIIIJIFFEF?CFFEDCB NM:i:1 MD:Z:21C38 AS:i:55 XS:i:0

    $ bwa aln ../ref/ref.fa r1.fq >r1.sai
    $ bwa aln ../ref/ref.fa r2.fq >r2.sai
    $ bwa sampe -r "@RG\tID:foo\tSM:bar" ../ref/ref.fa r1.sai r2.sai r1.fq r2.fq |\
    samtools view -bS - > aln.bam
    $ samtools view aln.bam
    1 83 1 1741 60 60M = 1681 -120 CCGTCAGCCTTTTCTTTGCCCTCTTCTTTCTGTTCATGTGTATTTGCTGTCTCTTAGCCC IHIHIJJJJHIHJJJJIGIIJJIJJIJJJJJIDJJJJJJJJIJIJIJHHFHHFFFFFCCC RG:Z:foo XT:A:U NM:i:1 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:1 XO:i:0 XG:i:0 MD:Z:18A41
    1 163 1 1681 60 60M = 1741 120 TAGTTTAAACGAGATTGCCAGGACCGGGTATCATTCACCATTTTTCTTTTCGTTAACTTG CCCFFFFFHHHHHJJJIIIHIIJGGIGIJEIJIFGGGIJJJGFIIIJIFFEF?CFFEDCB RG:Z:foo XT:A:U NM:i:1 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:1 XO:i:0 XG:i:0 MD:Z:21C38

    the problem is:
    samtools mpileup can't output valid results when the PROPER_PAIR flag not set.

  • #2
    If you want mpileup to include these cases use the -A option (-A: count anomalous read pairs). However, the question of way bwa mem doesn't like this pair remains...

    Comment


    • #3
      thanks. got it. Maybe bwa mem just like paired reads overlapped? I'll have a try.

      Comment


      • #4
        Originally posted by gada View Post
        Maybe bwa mem just like paired reads overlapped?
        Mmm.. I doubt it... Non overlapping pairs make perfect sense and I don't think such a bug could have sneaked into bwa mem undetected.

        Comment


        • #5
          $ bwa mem ../ref/ref.fa r1.fq r2.fq
          [M::bwa_idx_load_from_disk] read 0 ALT contigs
          @SQ SN:1 LN:100000
          @PG ID:bwa PN:bwa VN:0.7.10-r1017-dirty CL:bwa mem ../ref/ref.fa r1.fq r2.fq
          [M:rocess] read 2 sequences (182 bp)...
          [M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 1, 0, 0)
          [M::mem_pestat] skip orientation FF as there are not enough pairs
          [M::mem_pestat] skip orientation FR as there are not enough pairs
          [M::mem_pestat] skip orientation RF as there are not enough pairs
          [M::mem_pestat] skip orientation RR as there are not enough pairs
          [M::mem_process_seqs] Processed 2 reads in 0.000 CPU sec, 0.000 real sec
          1 97 1 1861 60 120M = 1979 180 CTTCATCTGCAGGTGTCTGACTTCCAGCAACTGCTGGCCTGTGCCAGGGTGCAAGCTGAGCACTGGAGTGGAGTTTTCCTGTGGAGAGGAGCCATGCCTAGAGTGGGATGGGCCATTGTT CCCCFFFFFHHFHHJIJIJIJJJJJJJJDIJJJJJIJJIJJIIGIJJJJHIHJJJJIHIHICCFFFFFHHFHHJIJIJIJJJJJJJJDIJJJJJIJJIJJIIGIJJJJHIHJJJJIHIHI NM:i:0 MD:Z:120 AS:i:120 XS:i:0
          1 145 1 1979 60 62M = 1861 -180 TTCATCTTCTGGCCCCTGTTGTCTGCATGTAACTTAATACCACAACCAGGCATAGGGGAAAG BBBCDEFFC?FEFFIJIIIFGJJJIGGGFIJIEJIGIGGJIIHIIIJJJHHHHHFFFFFCCC NM:i:0 MD:Z:62 AS:i:62 XS:i:0
          [main] Version: 0.7.10-r1017-dirty
          [main] CMD: bwa mem ../ref/ref.fa r1.fq r2.fq
          [main] Real time: 0.019 sec; CPU: 0.004 sec

          $ ruby -e 'puts 97.to_s(2)'
          1100001
          ruby -e 'puts 145.to_s(2)'
          10010001

          nothing changed.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          18 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          22 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          17 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          49 views
          0 likes
          Last Post seqadmin  
          Working...
          X