Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BWA sampe: wierd pairing

    When running bwa (0.5.4) smape I got this line output to the screen over and over again:

    [infer_isize] fail to infer insert size: weird pairing

    Should I be worrying about this? Does it mean the pairing is not correct?

    The following are the commands I used for alignment and sampe:

    bwa aln -l 32 -t 2 -q 4 Genomes/Btau_UMD3.fa s_1_1_sequence.fq > Run20_s_1_1_sequence.sai & bwa aln -l 32 -t 2 -q 4 Genomes/Btau_UMD3.fa s_1_2_sequence.fq > Run20_s_1_2_sequence.sai &

    bwa sampe -a 253 -o 1000 Genomes/Btau_UMD3/Btau_UMD3.fa s_1_1_sequence.sai s_1_2_sequence.sai s_1_1_sequence.fq s_1_2_sequence.fq > Run20_s_1_pe.bwa.sam

    Thank you.

  • #2
    Bwa fails to infer insert size and will use "-a" to set the maximum insert size in pairing. This may happen if too few reads are mapped or the insert size distribution is bimodal or something alike. You should check the distribution after mapping.

    Comment


    • #3
      The insert size specified was obtained from ELAND alignment. Do you think increasing the -a will help? And is there a tag in the sam file that reports pairing with wrong insert size?

      Comment


      • #4
        You should draw the distribution.

        Comment


        • #5
          Thanks for the quick reply.

          Looking closer at the output from the sampe below, am I right to assume that there are 1649 out of 262144 processed reads where the insert size cannot be inferred correctly?

          [bwa_read_seq] 0.0% bases are trimmed.
          [bwa_sai2sam_pe_core] convert to sequence coordinate...
          [infer_isize] fail to infer insert size: weird pairing
          [bwa_sai2sam_pe_core] time elapses: 160.55 sec
          [bwa_sai2sam_pe_core] change of coordinates in 1649 alignments.
          [bwa_sai2sam_pe_core] align unmapped mate...
          [bwa_sai2sam_pe_core] time elapses: 1.39 sec
          [bwa_sai2sam_pe_core] refine gapped alignments... 0.72 sec
          [bwa_sai2sam_pe_core] print alignments... 2.13 sec
          [bwa_sai2sam_pe_core] 262144 sequences have been processed.

          Comment


          • #6
            Did you ever solve this issue? I'm encountering the same error message...

            Comment


            • #7
              This message is usually caused by bad libraries. You should check the quality of your library in the first place. As I replied above, bwa still works if -a is about right, but to set a proper -a, again, you should plot the distribution of insert size. This is not a major problem with bwa but with your input data.

              Comment


              • #8
                My problem was actually due to the uneven number of pair reads in the input fastq files. I was doing some quality filterings, mainly artefacts removal, on read1 and read2 separately and this resulted in the 2 files having different number of reads.

                Comment


                • #9
                  No. You must make sure the two files contain the same set of pairs with identical order in each file. Your input will fail all aligners to date, so far as I know.

                  Comment


                  • #10
                    Thank you both for your help! It was indeed an issue with my library...

                    Comment


                    • #11
                      Dear Heng,

                      I aligned my mate-pair data with BWA (0.5.5) and observed a weird pairing of reads. I explain below:

                      when I run bwa sampe for one of pairs I get:
                      Code:
                      HWUSI-EAS454:1:2:0:108#0        113     chr2    96713303        0       50M     =       96439877        -273426 GATCAGTGGACTTTATGTTAATGAAAAAGGAAATCATCCAGGGTGCATCT      :B?BC?A-357;67C@C<CC<9B>BC<BB>B:<7>B=-BCBBBC@BB@B@      XT:A:R  NM:i:2    SM:i:0  AM:i:0  X0:i:3  X1:i:0  XM:i:2  XO:i:0  XG:i:0  MD:Z:7T23C18
                      HWUSI-EAS454:1:2:0:108#0        177     chr2    96439877        23      50M     =       96713303        273426  GAGTCTCTTTTGCTGAGTGTTGTCATATATGGAGGTGATGCATGGAACTG      ?A95/5?@B;?:@7BB9959?'79BAC>@B?;@>;B(B8:/'>;9C:BBB      XT:A:U  NM:i:2    SM:i:23 AM:i:0  X0:i:1  X1:i:2  XM:i:2  XO:i:0  XG:i:0  MD:Z:28C12C8
                      So here the distance between ends is 273426bp, though I (and BWA) know that "inferred external isize from 157719 pairs: 3054.215 +/- 185.122".

                      When I run BWA in simple end mode "bwa samse -n 30" for the same pair I get:
                      >HWUSI-EAS454:1:2:0:108#0 3 3
                      chr2 -96713303 2
                      chr2 -98220112 2
                      chr2 +96442725 2

                      on the left and

                      >HWUSI-EAS454:1:2:0:108#0 3 3
                      chr2 -96439877 2
                      chr2 +98222957 3
                      chr2 +96716152 3

                      on the right.

                      So my question is why BWA decides to pair ends in such a weird way when I could pair them as:
                      left: chr2 +96442725 2
                      right: chr2 -96439877 2
                      with ~2800bp of insert size?

                      And also, why in the output of "bwa samse -n 30" there is no information about quality of mapping? Why can't it be printed in SAM format as well?

                      Thank you in advance,
                      Valentina

                      Comment


                      • #12
                        Could you show the low and high boundaries from the bwa output? Something like:

                        [infer_isize] low and high boundaries: 330 and 670

                        EDIT: For a "proper read pair", you would expect to see the read with small coordinate mapped to the forward strand but in your example, it is the contrary. I guess you are aligning reads from Illumina long-insert library where the "proper pair" has RF orientation. Bwa does not support such read pairs. So far as I know, Maq is still the best tool for such alignment.
                        Last edited by lh3; 01-15-2010, 08:29 AM.

                        Comment


                        • #13
                          Low and high boundaries are: 2284 and 3824.

                          You are right, these are Solexa mate-pair data which should be aligned as "RF" instead of "FR"..

                          I have too much data to use Maq on them... Or I should run Bowtie first and then use Maq to align what was not aligned. But it is really a pitty that I cannot use BWA for that.

                          Maybe you could add a parameter that would specify which type of mapping you expect? Like you can run Bowtie in "--rf" or "--fr" mode.

                          Thanks,
                          Valentina

                          Comment


                          • #14
                            Hi elalo,
                            How did you find out it was an issue with your library. How can I take of this isize failure message?

                            Comment


                            • #15
                              Originally posted by zlu View Post
                              When running bwa (0.5.4) smape I got this line output to the screen over and over again:

                              [infer_isize] fail to infer insert size: weird pairing

                              Should I be worrying about this? Does it mean the pairing is not correct?

                              The following are the commands I used for alignment and sampe:

                              bwa aln -l 32 -t 2 -q 4 Genomes/Btau_UMD3.fa s_1_1_sequence.fq > Run20_s_1_1_sequence.sai & bwa aln -l 32 -t 2 -q 4 Genomes/Btau_UMD3.fa s_1_2_sequence.fq > Run20_s_1_2_sequence.sai &

                              bwa sampe -a 253 -o 1000 Genomes/Btau_UMD3/Btau_UMD3.fa s_1_1_sequence.sai s_1_2_sequence.sai s_1_1_sequence.fq s_1_2_sequence.fq > Run20_s_1_pe.bwa.sam

                              Thank you.
                              Hi zlu,

                              Do you mind tell me how you got rid of the failure message from bwa? I keep getting the message?

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Recent Innovations in Spatial Biology
                                by seqadmin


                                Spatial biology is an exciting field that encompasses a wide range of techniques and technologies aimed at mapping the organization and interactions of various biomolecules in their native environments. As this area of research progresses, new tools and methodologies are being introduced, accompanied by efforts to establish benchmarking standards and drive technological innovation.

                                3D Genomics
                                While spatial biology often involves studying proteins and RNAs in their...
                                01-01-2025, 07:30 PM
                              • seqadmin
                                Advancing Precision Medicine for Rare Diseases in Children
                                by seqadmin




                                Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                                12-16-2024, 07:57 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 01-09-2025, 04:04 PM
                              0 responses
                              432 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 01-09-2025, 09:42 AM
                              0 responses
                              441 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 01-08-2025, 03:17 PM
                              0 responses
                              453 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 01-03-2025, 11:18 AM
                              1 response
                              50 views
                              1 like
                              Last Post Tonia
                              by Tonia
                               
                              Working...
                              X