Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • oiiio
    Senior Member
    • Jan 2011
    • 105

    BWA sampe shows extremely large insert size

    I have an illumina sequenced genome that is being mapped paired-end. The other 4 parts that I merge with it to create the fully mapped genome all look normal. When at the sampe step of BWA for this last part of the genome, I noticed it was taking an EXTREMELY long time. Here is the output concerning isize:


    [infer_isize] (25, 50, 75) percentile: (30225, 52418, 70153)
    [infer_isize] low and high boundaries: 45 and 150009 for estimating avg and std
    [infer_isize] inferred external isize from 129 pairs: 49974.163 +/- 26814.967
    [infer_isize] skewness: -0.056; kurtosis: -0.892; ap_prior: 1.00e-05
    [infer_isize] inferred maximum insert size: 199065 (5.56 sigma)

    The inferred size is definitely not what it should be (~250). Has anyone had this kind of problem before? Is the sequencing data for this sample usable?
    Last edited by oiiio; 12-21-2011, 10:30 PM. Reason: typo
  • marcowanger
    Senior Member
    • Dec 2008
    • 273

    #2
    Which BWA version do you use?
    Marco

    Comment

    • swbarnes2
      Senior Member
      • May 2008
      • 910

      #3
      bwa assumes that the first read of fastq 1 is the mate of the first read of fastq 2 and so on. If that's not true, that could explain what you see.

      So you could start by running bwa samse on each, and spot-checking the first few lines of each sam, to see if the reads are paired up properly.

      I think I saw that error once when I misnamed one of the sai files. So I'd double check that the command line is calling for the right two sai files.

      Comment

      • oiiio
        Senior Member
        • Jan 2011
        • 105

        #4
        I use the latest version of BWA and I'll give samse a shot now

        Comment

        • dp05yk
          Member
          • Dec 2010
          • 66

          #5
          Sometimes insert sizes receive extremely high estimates for certain datasets - you can use the -A parameter to disable isize estimating altogether and enforce your own, IIRC.

          And I believe case-sensitivity matters, ie. -a is specified in the manual, but -A works.

          Comment

          • oiiio
            Senior Member
            • Jan 2011
            • 105

            #6
            Interestingly, the single end BAMs seemed alright. I'll try -A as well. Still quite suspicious about the integrity of the mapping though...

            Comment

            • swbarnes2
              Senior Member
              • May 2008
              • 910

              #7
              Try another test with just the first 200 or so fastq entries from each read, making single end and paired end .bams. And early error in a fastq file could throw everything off from that point on.

              If the distance you work out manually from the single end bams look right, and doesn't agree with the crazy ones from the paired end .bam, the most likely problem is just a brain fart on your part, where you have mistyped one of the file names. So double-check those.

              Comment

              • oiiio
                Senior Member
                • Jan 2011
                • 105

                #8
                @ swbarnes2

                What I found using this method was there is actually a gradual increase in the detected insert size rather than an abrupt jump. I didn't notice any obvious FASTQ errors in this range, and the BAM is running now ( although taking a horrifically long time due to the failed insert size detection)

                Comment

                Latest Articles

                Collapse

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by SEQadmin2, 06-09-2026, 11:58 AM
                0 responses
                18 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-05-2026, 10:09 AM
                0 responses
                27 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-04-2026, 08:59 AM
                0 responses
                38 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-02-2026, 12:03 PM
                0 responses
                61 views
                0 reactions
                Last Post SEQadmin2  
                Working...