Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • IDT exome panel - high duplication rates with high data throughput

    Hi All,

    We did an exome capture using the IDT exome panel, 12 samples per capture. To increase the data throughput per sample, the same capture pool is sequenced in two PE100 HiSeq lanes.

    When the data of only one lane is analyzed, the raw coverage is 96X, while usable reads (mapped to exome and removed duplicates) covered 55X (duplication rate = 20%). Yet when data of both lanes are analyzed together, raw coverage is roughly doubled (192X), but the usable reads became 88X (less than double of 55X), and duplication rate raised to 40%.

    Does any of you have experience in using IDT exome panel for capture? What are the percentage of usable reads, and duplication rate, respectively?

    Are there better ways to increase data throughput, while retaining high usable reads coverage?

    Million thanks!
    Hin

  • #2
    Could you explain your methodology in more detail? For example, the sequencing platform, run mode, library preparation, origin of the samples, and so forth. It sounds like you are finding duplicate reads between different libraries, which is not a useful approach.

    Comment


    • #3
      Hi Brian,

      We did the sequencing on HiSeq 1500, paired-end 100bp. Library preparation was done by KAPA Hyper Prep Kit. And the samples are frozen tissues.

      To clarify, the same input captured libraries were loaded into both lanes, so all lab conditions should be consistent between the two. Just that when we analyze one lane on its own, verses pooling the data of both lanes and analyze, there are discrepancies in usable read % and duplication rates.

      I suspect that there are molecules unique in one lane, but not when considered both lanes together (hence classified into duplicate reads).

      So would there be ways to increase data throughput and retain the high usable reads coverage? Would less samples per capture reaction helps?

      Thanks,
      Hin

      Comment


      • #4
        I do not have experience with IDT exome capture but their product specification should have stats on capture efficiency such as %mappable reads, % on target reads, %targets recovered and so on. You capture efficiency is less than what I have seen with SureSelect products.

        Duplicate rate depend on library diversity (unique fragments) and sequencing depth. For instance, a library with 10M unique reads will have less duplicate when sequenced reads are 10M in comparison to if it was sequenced to 50M.

        To increase sequencing depth without increasing duplicates the library diversity need to be increased and is dependent on the diversity of library going through capture reactions, capture probe design and post capture amplification. Easiest but costly approach would be to duplicate the whole capture process.

        Comment


        • #5
          Originally posted by hinkwok View Post
          So would there be ways to increase data throughput and retain the high usable reads coverage? Would less samples per capture reaction helps?
          Does this imply you did pre-capture pooling? It could be that post-capture PCR has had to have a few extra rounds, perhaps the hybridisations didn't yield well enough - that will increase your duplicates.

          This is higher than I have seen for other IDT panels though, which makes me think library prep is the issue here.

          Comment


          • #6
            Thanks for explaining the procedure in more detail.

            Originally posted by hinkwok View Post
            When the data of only one lane is analyzed, the raw coverage is 96X, while usable reads (mapped to exome and removed duplicates) covered 55X (duplication rate = 20%). Yet when data of both lanes are analyzed together, raw coverage is roughly doubled (192X), but the usable reads became 88X (less than double of 55X), and duplication rate raised to 40%.
            This is expected, when you have highly PCR-amplified libraries. As nucacidhunter stated, with a fixed number of unique molecules, the more deeply you sequence their clones, the more duplicates you will find. To avoid this, it's helpful to start with more DNA and do less amplification, though I don't know what the constraints are of the IDT kit.

            I highly recommend against mapping reads to the the exome, in any situation. This leads to false positive variant calls. Exome-capture data should be mapped to the genome. You will find that actually a substantial portion of the reads map to the genome outside of the baited regions, and these can often be useful (particularly when they map to pseudogenes that look like the baits).

            Incidentally, you can remove duplicates using Clumpify prior to mapping, which will reduce the mapping time substantially when you have a high rate of duplicates. The command would be something like:

            Code:
            clumpify.sh in1=r1.fq.gz in2=r2.fq.gz out1=clumped1.fq.gz out2=clumped2.fq.gz dedupe subs=5

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 06:37 PM
            0 responses
            10 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 06:07 PM
            0 responses
            9 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            49 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            67 views
            0 likes
            Last Post seqadmin  
            Working...
            X