Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • "ideal" insert size

    Has anyone discovered a study or formal recommendation of some sort that gives reason for chosing one ideal insert size for paried-end sequencing on human samples? I have been asked this by our labratory staff and all I can tell them is that a really narrow distribution would be good, but as for insert distance I have little information to go on.
    We do both alignment and assembly on our data.

    Any help appreciated.

  • #2
    You don't mention the platform you're using, but I'd imagine the major constraint is going to be the technical limitations of your sequencer. On Illumina systems longer insert lengths will result in larger, dimmer spots reducing both the amount and quality of data you can obtain. We've run libraries with insert sizes up to about 1kb but I'm not sure I'd want to go much higher than that. There's often no point in having really short inserts either since you'll end up reading through the insert and into the adapter in a significant proportion of your reads.

    The other big issue which may or may not be a factor for you is the amount of material you have. If you perform a very tight size selection then you're reducing the amount of material you have to create your library and you run the risk of getting a big pile of PCR artefacts if you start amplifying from too little material.

    I'm sure there are other considerations specific to your biological application. If you're doing assemblies you might want to look at mate pair libraries which allow the generation of paired sequences separated by much longer distances (2-5kb) whilst still keeping to the insert size limitations of the sequencing platform.

    Comment


    • #3
      Thanks for your input,
      Specifcially I've been asked this by our group who are responsible for illumina sequencing.

      They have cited the trade-off between tight distribution and yield, which makes sense to me.

      What befuddles me is that when I'm asked the question "if you could have any insert size, what would it be?" I don't have much to go on other than we don't want to sequence through the fragment twice. We have restrictions from WTSS, etc. which are driven by the sample, but for WGSS I'm looking for a bioinformatic reason to choose one size over another.

      Shouldn't there be some feature of hg18/hg19 like sines/lines etc. that would necessitate a larger or smaller insert size for WGSS libraries, so that we can make more use of them bioinformatically (aligning and assembly)?

      Comment


      • #4
        This is going eventually to come down to your use case. If you're doing some kind of ChIP experiment then you won't want to increase your insert size too much since you'll lose resolution in your feature detection. I don't do much assembly but my recollection from those that do is that it's useful to have a range of insert sizes (though maybe in separate experiments?) to allow for spanning of short and long repeats.

        Our experience has been that longer read lengths are negating many of the problems of duplicated alignments in remapping experiments. Once you're up to 50bp or so (either paired or single end) then a surprisingly high proportion of 'repeat' sequence is actually mappable. We work in backcrossed strains with no SNPs though, so maybe this is more of an issue if you have more diversity. These days most of the sequences we can't map come from regions not present in the genome assembly (telomeres and centromeres mostly), so there's not much we can do about that.

        Comment


        • #5
          I think your ideal insert size would be somewhere along the lines of the maximum insert and read length that allows you to maximize the throughput of your sequencing platform without saturating your data.

          Comment


          • #6
            I think a lot of these answers are good.

            The optimal insert size depends on your experiment and goals.

            I'm assuming you're not talking ChIP-seq (which often is best doing single-end).

            For exome-seq, something around 200-350 is more than adequate for hitting >99% of the targets and assessing variants. Probably >4 exomes per HiSeq lane doing this based on what I've seen.

            For whole genome, a combination of tightly distributed 200- and 2000-base inserts is optimal for human (for the sake of SV detection). The 2kb insert reads can be fairly low depth--they'll make up for issues mapping over LINEs that you eluded to).

            If you don't care about having the optimal SV detection rate, you can go with 200-350bp whole genome similar to exome without much issue (though the cost may be an issue).

            For the sake of phasing, a less tightly distributed mean 2-3000-base insert would be great (expecting about 1 SNV/1kb).
            Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
            Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
            Projects: U87MG whole genome sequence [Website] [Paper]

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            18 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            22 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            17 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            49 views
            0 likes
            Last Post seqadmin  
            Working...
            X