Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • bioinfosm
    Senior Member
    • Jan 2008
    • 483

    #16
    Originally posted by upenn_ngs View Post
    Another factor, many GC rich regions are dropped from both whole genome sequencing as well as the exome capture. This image from the Broad.

    http://www.postimage.org/image.php?v=aV6cBnA
    Do you know what the WGS track is, and from where one can obtain it for the IGV?
    --
    bioinfosm

    Comment

    • NGSfan
      Senior Member
      • Apr 2009
      • 181

      #17
      Originally posted by Xi Wang View Post
      Oh. But if you have the data, you can try what just I mentioned.

      And for PE reads, I don't think it can improve a lot. Because it is the DNA fragments that amplified. So the coverage should have some relationship with the GC-content of the DNA fragments. On the other hand, the read GC-content and the DNA fragment GC-content have a high correlation. As a result, the relationship between the read GC-content and the coverage reflects a lot the reality.
      I'm not clear on why converting the read coverage to a log scale would help understand distribution better. Simply visualizing coverage on a log scale will simply change the scale you're looking at, no?

      Maybe I'm not understanding the advantage. Could you show me an example?

      Comment

      • if1
        Junior Member
        • Nov 2009
        • 2

        #18
        Originally posted by bioinfosm View Post
        ...
        We notice that only 20% of reads map on-target! Is that a common thing? (Illumina 75bp PE)
        Hi,
        we have also run a TE using SureSelect with Illumina 76bp reads Single End and we obtained similar results to the Tewhey et al (Genome Biol 2009): 50% of uniquely aligned reads were on target with a uniformity of capture similar to what reported in the paper.
        I am wondering if someone else has results on Illumina 76 Paired-End, as it seems from Agilent website that the % on target should increase from 50% to 70% using PE protocol.

        Thanks

        Comment

        • NGSfan
          Senior Member
          • Apr 2009
          • 181

          #19
          Originally posted by bioinfosm View Post
          Another point which I did not notice here is, # of reads actually sequenced to get 30x exome coverage for the agilent capture stuff.

          We notice that only 20% of reads map on-target! Is that a common thing? (Illumina 75bp PE)
          Becareful with Tewhey's enrichment calculation, they calculated 48% "on or near target" meaning +/- 150bp around the target! If you look in the text, they actually only got 37% "on target" , so while your 20% is low, it's not terrible in comparison. How much of the genome are you targeting?

          In our case with 76-bp single end reads, we targeted 0.09% of the genome and enriched to 35% of the sequences being "on target" , which is a ~390-fold enrichment. If you were to actually convert Tewhey's numbers to solely "on target" (from 0.12% to 37%), then their claim of "about 400 fold enrichment" is actually ~290-fold. Just a small criticism.

          We have just completed a 76-bp paired end run with 4 samples multiplexed - I will let you know what we get with our alignment results
          Last edited by NGSfan; 03-17-2010, 02:45 AM.

          Comment

          • bioinfosm
            Senior Member
            • Jan 2008
            • 483

            #20
            Thanks NGSfan.

            The on or near number, taking +/- 200bp goes to 15%, still pretty low I would guess.

            It would be interesting to check, where all the rest 85% reads went!
            --
            bioinfosm

            Comment

            • Xi Wang
              Senior Member
              • Oct 2009
              • 317

              #21
              Originally posted by NGSfan View Post
              I'm not clear on why converting the read coverage to a log scale would help understand distribution better. Simply visualizing coverage on a log scale will simply change the scale you're looking at, no?

              Maybe I'm not understanding the advantage. Could you show me an example?
              I did not have such data, so I can't show you examples.
              But I noticed that the regions with high GC-content are less than the regions with average GC, as well as the low GC regions. That is to say there are more points in the figure around the average GC (x-axis). So it is more likely to have high coverage points in this part. This is what you saw in the figure. If you take log, the high-coverage points will decrease more than low-coverage do. This figure is more promising to reflect the nature of the relationship between read coverage and GC-content.
              Xi Wang

              Comment

              • bryan haffer
                Junior Member
                • Jun 2010
                • 1

                #22
                If you want to be able to enrich repetitive regions without bias, look at RainDance Technologies. Using their RainStorm approach, you can design PCR primers to capture 99% or greater of your target regions. The technology will also provide better uniformity allowing for less sequencing than SureSelect.

                Comment

                • NGSfan
                  Senior Member
                  • Apr 2009
                  • 181

                  #23
                  Hi Bryan,

                  Thanks for the suggestion. The Rain Dance approach seems to really be the better approach to handle these biased regions, albeit on a smaller scale.

                  For example, it doesn't scale very well for say, 1000 genes like the SureSelect, or if you want the whole exome, for example.

                  Comment

                  • ECO
                    --Site Admin--
                    • Oct 2007
                    • 1360

                    #24
                    Originally posted by bryan haffer View Post
                    If you want to be able to enrich repetitive regions without bias, look at RainDance Technologies. Using their RainStorm approach, you can design PCR primers to capture 99% or greater of your target regions. The technology will also provide better uniformity allowing for less sequencing than SureSelect.

                    www.raindancetech.com
                    In the interest of full disclosure. In the future please make your affiliations more clear.

                    Comment

                    • DoubleA
                      Member
                      • Jul 2010
                      • 16

                      #25
                      NGSfan,
                      Our lab has been using a custom SureSelect library to capture and sequence the extended HLA region (~8Mb). We have found that we get great coverage (>40X) over regions with <60% GC content while we get very poor coverage of regions with >60% GC content. I don't think that these results are out of the ordinary. I just came across the manuscript below...

                      Despite the ever-increasing output of Illumina sequencing data, loci with extreme base compositions are often under-represented or absent. To evaluate sources of base-composition bias, we traced genomic sequences ranging from 6% to 90% GC through the process by quantitative PCR. We identified PCR du …


                      They have shown that the PCR steps in the library constuction process create a huge bias against regions of high GC content. They have also shown how to resolve this problem. Check it out...

                      DoubleA
                      Last edited by DoubleA; 03-23-2011, 10:08 AM.

                      Comment

                      • Sam64
                        Member
                        • Jun 2011
                        • 15

                        #26
                        DoubleA,
                        Thank you for your link, the manuscript is very interesting but it just refers to the Illumina Sureselect protocol.
                        Has somebody some experience for the Agilent sureselect protocol ? in particular for the enzyme Herculase II...Does this enzyme allow to restore the fragments with high GC percent (like the AccuPrime Taq HiFi used in the manuscript) ?
                        Sam64

                        Comment

                        • DoubleA
                          Member
                          • Jul 2010
                          • 16

                          #27
                          Hello everyone,

                          I recently tried three different polymerases as well as different PCR conditions (increases in denaturation time) in an attempt to increase read coverage in regions with >60% GC content. I enriched 12 Illumina PE libraries with either AccuPrime, Phusion, or KAPA polymerase (4 with each polymerase). I should mention that I ligated the same adapter and incorporated a unique bar code for each library using the three primer enrichment approach (PE 1.0, PE 2.0, and a primer containing the bar code). Following library production, I pooled 4 libraries (all 4 created with the same polymerase) and performed a hybridization with a custom SureSelect bait library covering ~8Mb of the HLA region on human chromosome 6 (baits on ~3.8 Mb). Following hybridization and elution, I performed a final enrichment with primers covering the last 20bp of the 5' and 3' end of the Illumina libraries. All 3 pools (12 libraries) were mixed and run on a HiSeq lane for a single 40bp run. The coverage was of the target region was a bit low (10-50X) so we'll probably sequence these libraries in the future with a PE X 100bp run. I have attached graphs of read coverage vs %GC content (20bp window). As you can see, the coverage of the the GC rich regions is pretty similar for each polymerase and the increase in denaturation time per cycle did not help much either. Below is the % reads mapping to regions with GC content >60%. I thought some of you might be interested in these results.

                          Regards,
                          Double A

                          AccuPrime 15 second/cycle denaturation: 12.4%
                          AccuPrime 30 second/cycle denaturation: 12.9%
                          AccuPrime 45 second/cycle denaturation: 13.2%
                          AccuPrime 60 second/cycle denaturation: 14.4%

                          Phusion 15 second denaturation/cycle: 13.5%
                          Phusion 30 second denaturation/cycle: 16.4%
                          Phusion 45 second denaturation/cycle: 13.1%
                          Phusion 60 second denaturation/cycle: 12.6%

                          KAPA 15 second denaturation/cycle: 13.9%
                          KAPA 15 second denaturation/cycle: 10.7%
                          KAPA 15 second denaturation/cycle: 13.5%
                          KAPA 15 second denaturation/cycle: 11.6%
                          Attached Files

                          Comment

                          • arvi8689
                            Member
                            • Sep 2011
                            • 10

                            #28
                            coverage calc

                            Hey

                            I am trying to sequence the exome and the capture kit is 100MB

                            The sequencing core promised 120 million reads per lane and we are using paired end 100bp reads and our fragment size is 250 basepairs.

                            My calculation was I will get 120 million reads * 200= 240 million bases read

                            so coverage= 240 million bases/100MB= 240x coverage (average)

                            But some people say I will get a coverage of only 120x. What could be the reason? Or is the coverage actually 240x?

                            Comment

                            • DoubleA
                              Member
                              • Jul 2010
                              • 16

                              #29
                              Hi Arvi8689,

                              There are two things that will reduce your fold coverage with a exome capture experiment. First, at least 10% of your reads will be PCR duplicates and should be removed before alignment. Second, ~60-80% of your unique reads will be "on target". Therefore, it's likely that only 50% of your initial reads will be unique and map to your target.

                              Double A

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Pathogen Surveillance with Advanced Genomic Tools
                                by seqadmin




                                The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                                03-24-2025, 11:48 AM
                              • seqadmin
                                New Genomics Tools and Methods Shared at AGBT 2025
                                by seqadmin


                                This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                                The Headliner
                                The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                                03-03-2025, 01:39 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-20-2025, 05:03 AM
                              0 responses
                              49 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-19-2025, 07:27 AM
                              0 responses
                              57 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-18-2025, 12:50 PM
                              0 responses
                              50 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-03-2025, 01:15 PM
                              0 responses
                              201 views
                              0 reactions
                              Last Post seqadmin  
                              Working...