Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BWA sampe: wierd pairing

    When running bwa (0.5.4) smape I got this line output to the screen over and over again:

    [infer_isize] fail to infer insert size: weird pairing

    Should I be worrying about this? Does it mean the pairing is not correct?

    The following are the commands I used for alignment and sampe:

    bwa aln -l 32 -t 2 -q 4 Genomes/Btau_UMD3.fa s_1_1_sequence.fq > Run20_s_1_1_sequence.sai & bwa aln -l 32 -t 2 -q 4 Genomes/Btau_UMD3.fa s_1_2_sequence.fq > Run20_s_1_2_sequence.sai &

    bwa sampe -a 253 -o 1000 Genomes/Btau_UMD3/Btau_UMD3.fa s_1_1_sequence.sai s_1_2_sequence.sai s_1_1_sequence.fq s_1_2_sequence.fq > Run20_s_1_pe.bwa.sam

    Thank you.

  • #2
    Bwa fails to infer insert size and will use "-a" to set the maximum insert size in pairing. This may happen if too few reads are mapped or the insert size distribution is bimodal or something alike. You should check the distribution after mapping.

    Comment


    • #3
      The insert size specified was obtained from ELAND alignment. Do you think increasing the -a will help? And is there a tag in the sam file that reports pairing with wrong insert size?

      Comment


      • #4
        You should draw the distribution.

        Comment


        • #5
          Thanks for the quick reply.

          Looking closer at the output from the sampe below, am I right to assume that there are 1649 out of 262144 processed reads where the insert size cannot be inferred correctly?

          [bwa_read_seq] 0.0% bases are trimmed.
          [bwa_sai2sam_pe_core] convert to sequence coordinate...
          [infer_isize] fail to infer insert size: weird pairing
          [bwa_sai2sam_pe_core] time elapses: 160.55 sec
          [bwa_sai2sam_pe_core] change of coordinates in 1649 alignments.
          [bwa_sai2sam_pe_core] align unmapped mate...
          [bwa_sai2sam_pe_core] time elapses: 1.39 sec
          [bwa_sai2sam_pe_core] refine gapped alignments... 0.72 sec
          [bwa_sai2sam_pe_core] print alignments... 2.13 sec
          [bwa_sai2sam_pe_core] 262144 sequences have been processed.

          Comment


          • #6
            Did you ever solve this issue? I'm encountering the same error message...

            Comment


            • #7
              This message is usually caused by bad libraries. You should check the quality of your library in the first place. As I replied above, bwa still works if -a is about right, but to set a proper -a, again, you should plot the distribution of insert size. This is not a major problem with bwa but with your input data.

              Comment


              • #8
                My problem was actually due to the uneven number of pair reads in the input fastq files. I was doing some quality filterings, mainly artefacts removal, on read1 and read2 separately and this resulted in the 2 files having different number of reads.

                Comment


                • #9
                  No. You must make sure the two files contain the same set of pairs with identical order in each file. Your input will fail all aligners to date, so far as I know.

                  Comment


                  • #10
                    Thank you both for your help! It was indeed an issue with my library...

                    Comment


                    • #11
                      Dear Heng,

                      I aligned my mate-pair data with BWA (0.5.5) and observed a weird pairing of reads. I explain below:

                      when I run bwa sampe for one of pairs I get:
                      Code:
                      HWUSI-EAS454:1:2:0:108#0        113     chr2    96713303        0       50M     =       96439877        -273426 GATCAGTGGACTTTATGTTAATGAAAAAGGAAATCATCCAGGGTGCATCT      :B?BC?A-357;67C@C<CC<9B>BC<BB>B:<7>B=-BCBBBC@BB@B@      XT:A:R  NM:i:2    SM:i:0  AM:i:0  X0:i:3  X1:i:0  XM:i:2  XO:i:0  XG:i:0  MD:Z:7T23C18
                      HWUSI-EAS454:1:2:0:108#0        177     chr2    96439877        23      50M     =       96713303        273426  GAGTCTCTTTTGCTGAGTGTTGTCATATATGGAGGTGATGCATGGAACTG      ?A95/5?@B;?:@7BB9959?'79BAC>@B?;@>;B(B8:/'>;9C:BBB      XT:A:U  NM:i:2    SM:i:23 AM:i:0  X0:i:1  X1:i:2  XM:i:2  XO:i:0  XG:i:0  MD:Z:28C12C8
                      So here the distance between ends is 273426bp, though I (and BWA) know that "inferred external isize from 157719 pairs: 3054.215 +/- 185.122".

                      When I run BWA in simple end mode "bwa samse -n 30" for the same pair I get:
                      >HWUSI-EAS454:1:2:0:108#0 3 3
                      chr2 -96713303 2
                      chr2 -98220112 2
                      chr2 +96442725 2

                      on the left and

                      >HWUSI-EAS454:1:2:0:108#0 3 3
                      chr2 -96439877 2
                      chr2 +98222957 3
                      chr2 +96716152 3

                      on the right.

                      So my question is why BWA decides to pair ends in such a weird way when I could pair them as:
                      left: chr2 +96442725 2
                      right: chr2 -96439877 2
                      with ~2800bp of insert size?

                      And also, why in the output of "bwa samse -n 30" there is no information about quality of mapping? Why can't it be printed in SAM format as well?

                      Thank you in advance,
                      Valentina

                      Comment


                      • #12
                        Could you show the low and high boundaries from the bwa output? Something like:

                        [infer_isize] low and high boundaries: 330 and 670

                        EDIT: For a "proper read pair", you would expect to see the read with small coordinate mapped to the forward strand but in your example, it is the contrary. I guess you are aligning reads from Illumina long-insert library where the "proper pair" has RF orientation. Bwa does not support such read pairs. So far as I know, Maq is still the best tool for such alignment.
                        Last edited by lh3; 01-15-2010, 08:29 AM.

                        Comment


                        • #13
                          Low and high boundaries are: 2284 and 3824.

                          You are right, these are Solexa mate-pair data which should be aligned as "RF" instead of "FR"..

                          I have too much data to use Maq on them... Or I should run Bowtie first and then use Maq to align what was not aligned. But it is really a pitty that I cannot use BWA for that.

                          Maybe you could add a parameter that would specify which type of mapping you expect? Like you can run Bowtie in "--rf" or "--fr" mode.

                          Thanks,
                          Valentina

                          Comment


                          • #14
                            Hi elalo,
                            How did you find out it was an issue with your library. How can I take of this isize failure message?

                            Comment


                            • #15
                              Originally posted by zlu View Post
                              When running bwa (0.5.4) smape I got this line output to the screen over and over again:

                              [infer_isize] fail to infer insert size: weird pairing

                              Should I be worrying about this? Does it mean the pairing is not correct?

                              The following are the commands I used for alignment and sampe:

                              bwa aln -l 32 -t 2 -q 4 Genomes/Btau_UMD3.fa s_1_1_sequence.fq > Run20_s_1_1_sequence.sai & bwa aln -l 32 -t 2 -q 4 Genomes/Btau_UMD3.fa s_1_2_sequence.fq > Run20_s_1_2_sequence.sai &

                              bwa sampe -a 253 -o 1000 Genomes/Btau_UMD3/Btau_UMD3.fa s_1_1_sequence.sai s_1_2_sequence.sai s_1_1_sequence.fq s_1_2_sequence.fq > Run20_s_1_pe.bwa.sam

                              Thank you.
                              Hi zlu,

                              Do you mind tell me how you got rid of the failure message from bwa? I keep getting the message?

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              27 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              30 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              26 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              52 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X