Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Insert size with Agilent SureSelect

    We have noticed that our Agilent SureSelect libraries have a very small insert size after hybrid selection. Prior to selection, we see BioAnalyzer modes of 200-220 bp (with adapter length subtracted, so this is insert size). After hybridization and selection, we see 130-140 bp. On the sequencer, we are seeing similar very low numbers. Have other people seen this effect? Is there a way to work around it? We are planning to increase the input size to the hybridization, obviously, but we have not heard much about this issue and so are wondering if this is something specific to us or if this is a more general observation. The short insert size has some considerable implications for data quality and subsequent variant calling, so we would certainly like to get an insert size that is more reasonable.

    Thanks,
    Sean

  • #2
    I believe the sureselect baits are ~120bp in length, so they're probably preferentially binding the smaller fragments of your DNA. I know their protocol calls for shearing to ~150bp with an expected shift to 200-250 after ligation and amplification (prior to hybridization). Have you considered the possibility of concatamerization of your DNA if you shear to larger lengths?

    Comment


    • #3
      Insert size with Agilent SureSelect

      Thanks for the thoughts. We are using covaris to do the fragmentation and end up with 200-220 BP (without adapters) insert size prior to hybridization and end up loosing 60-80 bp for the mode by going through the hybridization process. If we end up running 2x80 or 2x100 bp runs, then we end up with many of the paired reads actually overlapping one another. While not a disaster, it isn't cost effective and makes later variant calling more difficult (because many of the bases are not independently read, being present in both the first and second reads). I'm not sure that concatemorizing would be our first choice, as it would introduce chimeras not present in the sample, if I understand you correctly.

      Sean

      Comment


      • #4
        Concatamerizing is no one's first choice, lol. I'm saying long fragments of DNA not covered by either baits or the adapter blocking oligos might hybridize, leading to off target sequences. While I can't say it would happen for sure, I'm not sure if I want to find out. Anyway, are you stuck with doing 2x80 or 2x100? I'm doing 2x75 on 150 bp fragments and it seems to be working out ok. Also, are you using the sureselect settings for the covaris? I've always had consistent peaks around 150 using those.

        Comment


        • #5
          I looked at the agilent kit a bit more. The size that the protocol targets is about 150bp, which we are a little under. However, even at 150bp for an average, there is a very large percentage of pairs that are going to overlap each other for a 2x75bp run. For the same reason that folks recommend removing duplicates, the portion of overlapping sequence should probably be trimmed so that a read is not "double-counted". So, in essence, this ends up reducing the effective read length and the extent of this reduction depends on the actual read length and how that relates to the insert size distribution. In practice, I think we are going to go ahead with a much larger insert size test (~300bp) and see what that gives us.

          Thanks for the help on this, GW_OK.

          Sean

          Comment


          • #6
            I'd be very interested to know how your longer insert sizes work once you've tried them out.

            Comment


            • #7
              Note: I've posted a question around the issue of how variant callers handle overlapping paired end reads to the bioinformatics forum. It's a good question & I wanted to make sure the right audience sees it.

              Comment


              • #8
                New Agilent All-Exon protocol

                To address this issue, we (Agilent) have released a new protocol for All-Exon capture that shifts the insert size, using a combination of Herculase II high-fidelity polymerase and SPRI-bead purification throughout. It may be found at:



                G3362-90001
                SureSelect Target Enrichment System for Illumina Paired-End Sequencing Library - Human All Exon and Human All Exon Plus Protocol
                v.2.0.1

                When following this protocol, the library size after capture (including adapters) is 325-350 bp (= insert size of ~230 bp).

                Best regards,
                Scott

                Comment


                • #9
                  Hmmm. Is this compatible with the exome kits I've already got in my freezer (I'm assuming yes)? Also, the new protocol linked says after shearing you should peak at 190, not 230.

                  Edit: I suppose it technically doesn't say base size at all after shearing, but the included BioA trace peak is at 190.

                  Edit Edit: It does say 150 to 200 at the ed of the shearing step but it doesn't say at the size verification step.

                  Edit^3: You didn't change the covaris settings at all (?!)

                  Edit^4: Agilent tech support says the Covaris settings listed in table 8 of the new protocol do not reflect the new insert size changes and they will fix it.
                  Last edited by GW_OK; 05-10-2010, 10:51 AM.

                  Comment


                  • #10
                    OK so Owen Hardy from Sureselect Support says they're not really changing Covaris settings and are merely widening the acceptable range of fragments from a peak at 150 to a peak anywhere between 150-200.

                    This has gotten me thinking, though. When my lab upgrades to the HiSeq this insert size issues is going to become very important as the HiSeq default run length is 100bp paired end or 200bp single end (you can do shorter but that seems to be a bit of a hassle, what with removing reagents and such). Should I switch to single end reads for my exome captures then?

                    Comment


                    • #11
                      After capture the size-range put on the sequencer might be 150-500bp. There will be overlap on a small portion paired reads, but the increased throughput will compensate.

                      Comment


                      • #12
                        True, but if you did long single end reads there wouldn't be any overlap at all. And if the majority of your reads are ~150-200bp (which they ought to be, I think) you can read the whole thing in one go without having to do paired ends.

                        Why do paired end reads at all (for exome at least) if you can sequence the whole (or quite a bit of the) captured piece of DNA? You don't really need the assembly benefits of paired ends, especially since you're assembling to reference. And it's much cheaper in reagents...

                        Comment


                        • #13
                          For us, it looks like 5 gigabases is the minimum sequence to get >80% exome coverage at 20x. Right now 2x100bp reads are the quickest route. As the read length and cluster density increase I can see where single-end reads might become advantageous.

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Techniques and Challenges in Conservation Genomics
                            by seqadmin



                            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                            Avian Conservation
                            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                            03-08-2024, 10:41 AM
                          • seqadmin
                            The Impact of AI in Genomic Medicine
                            by seqadmin



                            Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                            02-26-2024, 02:07 PM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 03-14-2024, 06:13 AM
                          0 responses
                          32 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-08-2024, 08:03 AM
                          0 responses
                          71 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-07-2024, 08:13 AM
                          0 responses
                          80 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-06-2024, 09:51 AM
                          0 responses
                          68 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X