Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BWA - fail to infer insert size: too few good pairs

    I have some MiSeq data that I am trying to align with BWA to hg19.

    This is the command:
    bwa sampe -P <dir>/BWAIndex/genome.fa 1_1.sai 1_2.sai 1_1.fastq 1_2.fastq > out.sam

    This is the output I am seeing:
    Code:
    [bwa_sai2sam_pe_core] convert to sequence coordinate... 
    [infer_isize] fail to infer insert size: too few good pairs
    [bwa_sai2sam_pe_core] time elapses: 0.10 sec
    [bwa_sai2sam_pe_core] changing coordinates of 0 alignments.
    [bwa_sai2sam_pe_core] align unmapped mate...
    [bwa_sai2sam_pe_core] time elapses: 0.00 sec
    [bwa_sai2sam_pe_core] refine gapped alignments... 0.10 sec
    [bwa_sai2sam_pe_core] print alignments... 1.06 sec
    [bwa_sai2sam_pe_core] 262144 sequences have been processed.
    [bwa_sai2sam_pe_core] convert to sequence coordinate... 
    [infer_isize] fail to infer insert size: too few good pairs
    [bwa_sai2sam_pe_core] time elapses: 0.06 sec
    [bwa_sai2sam_pe_core] changing coordinates of 0 alignments.
    [bwa_sai2sam_pe_core] align unmapped mate...
    [bwa_sai2sam_pe_core] time elapses: 0.00 sec
    [bwa_sai2sam_pe_core] refine gapped alignments... 0.08 sec
    [bwa_sai2sam_pe_core] print alignments... 1.13 sec
    [bwa_sai2sam_pe_core] 524288 sequences have been processed.
    [bwa_sai2sam_pe_core] convert to sequence coordinate... 
    [infer_isize] fail to infer insert size: too few good pairs
    [bwa_sai2sam_pe_core] time elapses: 0.01 sec
    [bwa_sai2sam_pe_core] changing coordinates of 0 alignments.
    [bwa_sai2sam_pe_core] align unmapped mate...
    [bwa_sai2sam_pe_core] time elapses: 0.00 sec
    [bwa_sai2sam_pe_core] refine gapped alignments... 0.01 sec
    [bwa_sai2sam_pe_core] print alignments... 0.16 sec
    [bwa_sai2sam_pe_core] 569473 sequences have been processed.
    [main] Version: 0.6.2-r126
    It outputs a BAM file, but nothing is aligned. The same data was processed through Illumina's BaseSpace service, which uses BWA, and it produced results, so there should not be anything wrong with the actual reads. I also tried aligning with Bowtie2 and got reasonable results, so the FASTQ files should not be corrupted.

    I have no idea what's wrong and I can't seem to find any record of such an error anywhere online. How do I troubleshoot this?

  • #2
    It means what it says. The software thinks that almost none of your reads aligned. Are you 100% sure the bwa aln steps are correct?

    Comment


    • #3
      Originally posted by swbarnes2 View Post
      It means what it says. The software thinks that almost none of your reads aligned. Are you 100% sure the bwa aln steps are correct?
      I am using this for bwa aln:
      bwa aln <dir>/BWAIndex/genome.fa 1_1.fastq > 1_1.sai
      bwa aln <dir>/BWAIndex/genome.fa 1_2.fastq > 1_2.sai

      If there was a problem with bwa aln, I assume sai files would not get generated or at least there would be some error shown.

      Comment


      • #4
        What I meant was, double check to make sure that you used the same reference genome in all the commands, and that it is the right reference genome, and that you got all the file names right in all the commands.

        One quick trouble-shooting thing is to do make a single-end sam from one fastq, and confirm that that looks okay.

        Comment


        • #5
          Originally posted by swbarnes2 View Post
          What I meant was, double check to make sure that you used the same reference genome in all the commands, and that it is the right reference genome, and that you got all the file names right in all the commands.

          One quick trouble-shooting thing is to do make a single-end sam from one fastq, and confirm that that looks okay.
          If I use the wrong genome or wrong file names, I get an error.

          I will have to check how single-end SAM will work, but doing paired-end alignment using Bowtie2 works fine, so there should not be anything wrong with the reads.

          Comment


          • #6
            Originally posted by swbarnes2 View Post
            One quick trouble-shooting thing is to do make a single-end sam from one fastq, and confirm that that looks okay.
            I tried using bwa samse instead of sampe. I don't get any errors displayed, but it still fails to make any alignments.

            This is the output in case it makes any difference:
            Code:
            > bwa samse <dir>/genome.fa 1_1.sai 1_1.fastq > 1_1.sam
            [bwa_aln_core] convert to sequence coordinate... 1.40 sec
            [bwa_aln_core] refine gapped alignments... 0.71 sec
            [bwa_aln_core] print alignments... 0.55 sec
            [bwa_aln_core] 262144 sequences have been processed.
            [bwa_aln_core] convert to sequence coordinate... 1.37 sec
            [bwa_aln_core] refine gapped alignments... 0.72 sec
            [bwa_aln_core] print alignments... 0.65 sec
            [bwa_aln_core] 524288 sequences have been processed.
            [bwa_aln_core] convert to sequence coordinate... 1.25 sec
            [bwa_aln_core] refine gapped alignments... 0.59 sec
            [bwa_aln_core] print alignments... 0.09 sec
            [bwa_aln_core] 569473 sequences have been processed.
            [main] Version: 0.6.2-r126

            Comment


            • #7
              Originally posted by id0 View Post
              I will have to check how single-end SAM will work, but doing paired-end alignment using Bowtie2 works fine, so there should not be anything wrong with the reads.
              If you have the Bowtie2 PE result then perhaps do "samtools flagstat" on the Bowtie2 BAM file to determine how many reads alignn, how many proper pairs to expect,etc. You may also infer the insert size using BAMtools or Picard's CollectInsertSizeMetrics.

              Comment


              • #8
                Originally posted by zee View Post
                If you have the Bowtie2 PE result then perhaps do "samtools flagstat" on the Bowtie2 BAM file to determine how many reads alignn, how many proper pairs to expect,etc. You may also infer the insert size using BAMtools or Picard's CollectInsertSizeMetrics.
                samtools flagstat produces good results (as far as I can tell):
                Code:
                1138946 + 0 in total (QC-passed reads + QC-failed reads)
                0 + 0 duplicates
                1093043 + 0 mapped (95.97%:nan%)
                1138946 + 0 paired in sequencing
                569473 + 0 read1
                569473 + 0 read2
                1052996 + 0 properly paired (92.45%:nan%)
                1085424 + 0 with itself and mate mapped
                7619 + 0 singletons (0.67%:nan%)
                1750 + 0 with mate mapped to a different chr
                1193 + 0 with mate mapped to a different chr (mapQ>=5)

                Comment


                • #9
                  Ever solve the problem?

                  Did you ever solve this problem? I am seeing the exact same problem with our MiSeq data exact the problem exists only for the reverse reads.

                  When I ran sampe the forward reads align no problem, but the reverse reads don't. When I try to run samse on the read pair individually, again the forward are fine, but +90% of the reverse reads do not align.

                  Only happens for this one case that we have....

                  If advice would be appreciated.

                  Comment


                  • #10
                    I just posted a similar problem - I hadn't seen this - did you ever resolve the mis-paired alignments?
                    Thanks!

                    Comment


                    • #11
                      Originally posted by AlliCox View Post
                      I just posted a similar problem - I hadn't seen this - did you ever resolve the mis-paired alignments?
                      Thanks!
                      My problem was that the reads were quite long (250 basepairs) and these weren't suited for BWA since from my understanding is a short-read aligner.

                      I ended up switching over to Bowtie2 which can handle longer reads and this solved my problem.

                      Hope that helps,

                      Comment


                      • #12
                        Originally posted by AlliCox View Post
                        I just posted a similar problem - I hadn't seen this - did you ever resolve the mis-paired alignments?
                        Thanks!
                        My original BWA issue did not get resolved. I am using Mac OS X 10.7 and I have seen reports of other Mac users reporting the same problem. When I switched from BWA 0.6.2 to BWA 0.5.9, it was working fine.

                        As I was writing this reply, I noticed BWA 0.7.0 was released last week. I am not sure if that one solves any of the Mac issues. See changelog here:
                        Burrow-Wheeler Aligner for short-read alignment (see minimap2 for long-read alignment) - File not found · lh3/bwa


                        Originally posted by fongchun View Post
                        My problem was that the reads were quite long (250 basepairs) and these weren't suited for BWA since from my understanding is a short-read aligner.
                        I used BWA for 2x250bp reads and I did not encounter any problems. From BWA main page:

                        Burrows-Wheeler Aligner (BWA) is an efficient program that aligns relatively short nucleotide sequences against a long reference sequence such as the human genome. It implements two algorithms, bwa-short and BWA-SW. The former works for query sequences shorter than 200bp and the latter for longer sequences up to around 100kbp.

                        Comment


                        • #13
                          Originally posted by id0 View Post
                          I used BWA for 2x250bp reads and I did not encounter any problems.
                          Then I am back to square one in terms of not knowing why BWA didn't work for my situation. The only other thing I can think of was that the base qualities in the reverse pair were very poor after about ~150 bases. This was actually a problem with the sequencing itself.

                          Maybe this played a role in the alignment process, but I don't know enough about BWA to comment on this.
                          Last edited by fongchun; 03-06-2013, 02:32 PM.

                          Comment


                          • #14
                            Actually it just came back to me that I used BWA-short and not BWA-SW. According to the manual, BWA-short is suited for reads shorter than 200 bp. This is likely why it failed.

                            I wanted to use BWA-SW, but from my understanding it doesn't handle paired-end reads? Bowtie2 does however so that's why we ended up switching over to Bowtie2 for this reason.

                            Originally posted by id0 View Post
                            I used BWA for 2x250bp reads and I did not encounter any problems. From BWA main page:
                            Out of curiosity, you ran BWA-short for that right? Or did you get it to work with BWA-SW? Because I didn't see any options for paired-end reads.
                            Last edited by fongchun; 03-06-2013, 02:32 PM.

                            Comment


                            • #15
                              Originally posted by fongchun View Post
                              Actually it just came back to me that I used BWA-short and not BWA-SW. According to the manual, BWA-short is suited for reads shorter than 200 bp. This is likely why it failed.

                              I wanted to use BWA-SW, but from my understanding it doesn't handle paired-end reads? Bowtie2 does however so that's why we ended up switching over to Bowtie2 for this reason.
                              You can try trimming your FASTQ files and see how that goes. Sickle trims based on quality scores, so you would only lose low quality bases:

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              9 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              67 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X