Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BWA sampe shows extremely large insert size

    I have an illumina sequenced genome that is being mapped paired-end. The other 4 parts that I merge with it to create the fully mapped genome all look normal. When at the sampe step of BWA for this last part of the genome, I noticed it was taking an EXTREMELY long time. Here is the output concerning isize:


    [infer_isize] (25, 50, 75) percentile: (30225, 52418, 70153)
    [infer_isize] low and high boundaries: 45 and 150009 for estimating avg and std
    [infer_isize] inferred external isize from 129 pairs: 49974.163 +/- 26814.967
    [infer_isize] skewness: -0.056; kurtosis: -0.892; ap_prior: 1.00e-05
    [infer_isize] inferred maximum insert size: 199065 (5.56 sigma)

    The inferred size is definitely not what it should be (~250). Has anyone had this kind of problem before? Is the sequencing data for this sample usable?
    Last edited by oiiio; 12-21-2011, 10:30 PM. Reason: typo

  • #2
    Which BWA version do you use?
    Marco

    Comment


    • #3
      bwa assumes that the first read of fastq 1 is the mate of the first read of fastq 2 and so on. If that's not true, that could explain what you see.

      So you could start by running bwa samse on each, and spot-checking the first few lines of each sam, to see if the reads are paired up properly.

      I think I saw that error once when I misnamed one of the sai files. So I'd double check that the command line is calling for the right two sai files.

      Comment


      • #4
        I use the latest version of BWA and I'll give samse a shot now

        Comment


        • #5
          Sometimes insert sizes receive extremely high estimates for certain datasets - you can use the -A parameter to disable isize estimating altogether and enforce your own, IIRC.

          And I believe case-sensitivity matters, ie. -a is specified in the manual, but -A works.

          Comment


          • #6
            Interestingly, the single end BAMs seemed alright. I'll try -A as well. Still quite suspicious about the integrity of the mapping though...

            Comment


            • #7
              Try another test with just the first 200 or so fastq entries from each read, making single end and paired end .bams. And early error in a fastq file could throw everything off from that point on.

              If the distance you work out manually from the single end bams look right, and doesn't agree with the crazy ones from the paired end .bam, the most likely problem is just a brain fart on your part, where you have mistyped one of the file names. So double-check those.

              Comment


              • #8
                @ swbarnes2

                What I found using this method was there is actually a gradual increase in the detected insert size rather than an abrupt jump. I didn't notice any obvious FASTQ errors in this range, and the BAM is running now ( although taking a horrifically long time due to the failed insert size detection)

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM
                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 06:37 PM
                0 responses
                8 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, Yesterday, 06:07 PM
                0 responses
                8 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-22-2024, 10:03 AM
                0 responses
                49 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-21-2024, 07:32 AM
                0 responses
                66 views
                0 likes
                Last Post seqadmin  
                Working...
                X