Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Boel
    Member
    • Oct 2009
    • 62

    Insert size != Fragment size?

    Hi All,

    This is a very simple question, or should be, but there seems to be some confusion out there.

    My understanding is that insert size is the number of bases between paired end reads (example: 300 bp fragments, 2*75 reads -> insert size 150). However, looking trough different threads here and information elsewhere the fragment is sometime referred to as insert as well, making insert size == fragment size.

    I wonder whether different programs (BWA, picards CollectInsertSizeMetrics, bfast, samtools etc) have different definitions of insert size, which might make things messy. Right now I am especially interested in picards definition.

    Any ideas?

    Thanks,
    Boel
  • cw11
    Member
    • Sep 2011
    • 12

    #2
    I'm not sure about Picard specifically, but I found a thread that discusses insert size here, which seems to suggest that the insert size is the stretch of sequence between the adapters (so in your example, 300 would be correct). However, certain tuxedo suite programs (tophat/cufflinks/bowtie) take a --mean-inner-dist option, defined as fragment length - reads.

    Comment

    • nickloman
      Senior Member
      • Jul 2009
      • 355

      #3
      Yes, I've seen both definitions used by different software. I know Bowtie uses insert size == fragment size for example.

      Personally I like this definition because you could sequence the same library with different read lengths, or you could have variable length reads (Ion Torrent) or you could trim your 3' read tips. Each step would vary the insert size, but the fragment size would remain constant.

      I think part of the difficulty comes from the difference between Illumina paired-end protocols (e.g. bidirectional sequencing) where insert size is always related to fragment size and the long mate-pair/jumping protocols, where the insert size relates instead to the sizing step (e.g. 8kb gel-cut) and is independent of fragment length.

      Comment

      • Boel
        Member
        • Oct 2009
        • 62

        #4
        According to samtools help (which includes picard):
        "For Illumina paired-end data, the inferred insert size would be the difference between the 5' positions of the two reads." This translates to 300 bases in my previous example, since nucleotides are added in 5' to 3' direction.

        Thanks for replying!

        Comment

        • cw11
          Member
          • Sep 2011
          • 12

          #5
          Yup - Glad you found an answer!

          Comment

          • jfostel
            Junior Member
            • Aug 2010
            • 7

            #6
            Whether insert size = fragment size does vary from tool to tool, I would specifically look it up for whatever you're using.

            Regardless, the total adaptor-insert-adaptor length is useful as the best predictor of a library's amplification behavior (both in qPCR QC and on the flowcell). For example, you wouldn't want to pool together two libraries with identical 200bp inserts but very different adaptor + index lengths (unless it was acceptable for the majority of the reads to come from the smaller construct).

            Comment

            • rskr
              Senior Member
              • Oct 2010
              • 249

              #7
              Don't bother, use a different term. You will be misunderstood. When you mean fragment size, say "fragment size", when mean the distance between the pairs. Say, "distance between the pairs".

              Comment

              Latest Articles

              Collapse

              • SEQadmin2
                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                by SEQadmin2


                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                Here are nine questions we think about, in roughly the order they matter, before...
                06-18-2026, 07:11 AM
              • SEQadmin2
                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                by SEQadmin2


                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                ...
                06-02-2026, 10:05 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, 06-26-2026, 11:10 AM
              0 responses
              15 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-17-2026, 06:09 AM
              0 responses
              49 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-09-2026, 11:58 AM
              0 responses
              107 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-05-2026, 10:09 AM
              0 responses
              125 views
              0 reactions
              Last Post SEQadmin2  
              Working...