Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • hamcan
    Member
    • Nov 2016
    • 19

    Average Insert Size

    Hi there,
    I ran an Illumina HiSeq run 2x250 and wanted to know how I could find out the average insert size from my fastq files?

    Thanks in advance!
  • Brian Bushnell
    Super Moderator
    • Jan 2014
    • 2709

    #2
    There are three primary ways. I'll describe how to do them using the BBMap package.

    1) Via mapping, which requires a reference:
    bbmap.sh in1=r1.fastq in2=r2.fastq ref=ref.fasta ihist=ihist.txt reads=2m pairlen=2000

    2) Via overlap, which requires overlapping reads (they probably overlap given you ran at 2x250):
    bbmerge.sh in1=r1.fastq in2=r2.fastq ihist=ihist.txt reads=2m

    3) Via assembly, which requires sufficient read depth and memory to assemble the genome:
    bbmerge-auto.sh in1=r1.fastq in2=r2.fastq ihist=ihist.txt extend2=200

    2 is the fastest. The best choice and best settings depend on your data, though. Can you describe the organism, experiment, and target insert size?

    Comment

    • hamcan
      Member
      • Nov 2016
      • 19

      #3
      Thanks for responding so quickly!
      Well I ran a HiSeq run on environmental water samples. So it's a metagenomics experiment. The purpose of the experiment is to look at what species are predominant in certain bodies of water. I am looking at bacterial species more specifically.

      I am going to run trimmomatic to trim my reads and remove adapter sequences. Afterwards I am going to use FLASh to merge my reads. In order to find the correct parameters to use for FLASh I need to figure out my average insert size. Hence that is why I am trying to find out how to do that.

      Comment

      • Brian Bushnell
        Super Moderator
        • Jan 2014
        • 2709

        #4
        Full disclosure - I developed BBDuk and BBMerge. In my testing, BBDuk has greater accuracy than Trimmomatic, and BBMerge has greater accuracy than Flash. But you can certainly determine your insert size with BBMerge and then use that with Flash, if you wish. In my experience that's not necessary; Flash will not give substantially better output even when you know the average insert size apriori. Rather, it will output a helpful message when you give it settings it finds inappropriate. You can then correct them, but it will still yield similar results, in my experience.

        Unless you answer all of the questions posed, nobody can give you optimal advice... for example, what's the target insert size?

        Comment

        Latest Articles

        Collapse

        • SEQadmin2
          Nine Things a Sample Prep Scientist Thinks About Before Sequencing
          by SEQadmin2


          I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

          Here are nine questions we think about, in roughly the order they matter, before...
          06-18-2026, 07:11 AM
        • SEQadmin2
          From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
          by SEQadmin2


          Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


          The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
          ...
          06-02-2026, 10:05 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by SEQadmin2, Today, 05:37 AM
        0 responses
        5 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-26-2026, 11:10 AM
        0 responses
        16 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-17-2026, 06:09 AM
        0 responses
        49 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-09-2026, 11:58 AM
        0 responses
        109 views
        0 reactions
        Last Post SEQadmin2  
        Working...