Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Average Insert Size

    Hi there,
    I ran an Illumina HiSeq run 2x250 and wanted to know how I could find out the average insert size from my fastq files?

    Thanks in advance!

  • #2
    There are three primary ways. I'll describe how to do them using the BBMap package.

    1) Via mapping, which requires a reference:
    bbmap.sh in1=r1.fastq in2=r2.fastq ref=ref.fasta ihist=ihist.txt reads=2m pairlen=2000

    2) Via overlap, which requires overlapping reads (they probably overlap given you ran at 2x250):
    bbmerge.sh in1=r1.fastq in2=r2.fastq ihist=ihist.txt reads=2m

    3) Via assembly, which requires sufficient read depth and memory to assemble the genome:
    bbmerge-auto.sh in1=r1.fastq in2=r2.fastq ihist=ihist.txt extend2=200

    2 is the fastest. The best choice and best settings depend on your data, though. Can you describe the organism, experiment, and target insert size?

    Comment


    • #3
      Thanks for responding so quickly!
      Well I ran a HiSeq run on environmental water samples. So it's a metagenomics experiment. The purpose of the experiment is to look at what species are predominant in certain bodies of water. I am looking at bacterial species more specifically.

      I am going to run trimmomatic to trim my reads and remove adapter sequences. Afterwards I am going to use FLASh to merge my reads. In order to find the correct parameters to use for FLASh I need to figure out my average insert size. Hence that is why I am trying to find out how to do that.

      Comment


      • #4
        Full disclosure - I developed BBDuk and BBMerge. In my testing, BBDuk has greater accuracy than Trimmomatic, and BBMerge has greater accuracy than Flash. But you can certainly determine your insert size with BBMerge and then use that with Flash, if you wish. In my experience that's not necessary; Flash will not give substantially better output even when you know the average insert size apriori. Rather, it will output a helpful message when you give it settings it finds inappropriate. You can then correct them, but it will still yield similar results, in my experience.

        Unless you answer all of the questions posed, nobody can give you optimal advice... for example, what's the target insert size?

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        22 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        24 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        19 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        50 views
        0 likes
        Last Post seqadmin  
        Working...
        X