Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by upenn_ngs View Post
    Another factor, many GC rich regions are dropped from both whole genome sequencing as well as the exome capture. This image from the Broad.

    http://www.postimage.org/image.php?v=aV6cBnA
    Do you know what the WGS track is, and from where one can obtain it for the IGV?
    --
    bioinfosm

    Comment


    • #17
      Originally posted by Xi Wang View Post
      Oh. But if you have the data, you can try what just I mentioned.

      And for PE reads, I don't think it can improve a lot. Because it is the DNA fragments that amplified. So the coverage should have some relationship with the GC-content of the DNA fragments. On the other hand, the read GC-content and the DNA fragment GC-content have a high correlation. As a result, the relationship between the read GC-content and the coverage reflects a lot the reality.
      I'm not clear on why converting the read coverage to a log scale would help understand distribution better. Simply visualizing coverage on a log scale will simply change the scale you're looking at, no?

      Maybe I'm not understanding the advantage. Could you show me an example?

      Comment


      • #18
        Originally posted by bioinfosm View Post
        ...
        We notice that only 20% of reads map on-target! Is that a common thing? (Illumina 75bp PE)
        Hi,
        we have also run a TE using SureSelect with Illumina 76bp reads Single End and we obtained similar results to the Tewhey et al (Genome Biol 2009): 50% of uniquely aligned reads were on target with a uniformity of capture similar to what reported in the paper.
        I am wondering if someone else has results on Illumina 76 Paired-End, as it seems from Agilent website that the % on target should increase from 50% to 70% using PE protocol.

        Thanks

        Comment


        • #19
          Originally posted by bioinfosm View Post
          Another point which I did not notice here is, # of reads actually sequenced to get 30x exome coverage for the agilent capture stuff.

          We notice that only 20% of reads map on-target! Is that a common thing? (Illumina 75bp PE)
          Becareful with Tewhey's enrichment calculation, they calculated 48% "on or near target" meaning +/- 150bp around the target! If you look in the text, they actually only got 37% "on target" , so while your 20% is low, it's not terrible in comparison. How much of the genome are you targeting?

          In our case with 76-bp single end reads, we targeted 0.09% of the genome and enriched to 35% of the sequences being "on target" , which is a ~390-fold enrichment. If you were to actually convert Tewhey's numbers to solely "on target" (from 0.12% to 37%), then their claim of "about 400 fold enrichment" is actually ~290-fold. Just a small criticism.

          We have just completed a 76-bp paired end run with 4 samples multiplexed - I will let you know what we get with our alignment results
          Last edited by NGSfan; 03-17-2010, 02:45 AM.

          Comment


          • #20
            Thanks NGSfan.

            The on or near number, taking +/- 200bp goes to 15%, still pretty low I would guess.

            It would be interesting to check, where all the rest 85% reads went!
            --
            bioinfosm

            Comment


            • #21
              Originally posted by NGSfan View Post
              I'm not clear on why converting the read coverage to a log scale would help understand distribution better. Simply visualizing coverage on a log scale will simply change the scale you're looking at, no?

              Maybe I'm not understanding the advantage. Could you show me an example?
              I did not have such data, so I can't show you examples.
              But I noticed that the regions with high GC-content are less than the regions with average GC, as well as the low GC regions. That is to say there are more points in the figure around the average GC (x-axis). So it is more likely to have high coverage points in this part. This is what you saw in the figure. If you take log, the high-coverage points will decrease more than low-coverage do. This figure is more promising to reflect the nature of the relationship between read coverage and GC-content.
              Xi Wang

              Comment


              • #22
                If you want to be able to enrich repetitive regions without bias, look at RainDance Technologies. Using their RainStorm approach, you can design PCR primers to capture 99% or greater of your target regions. The technology will also provide better uniformity allowing for less sequencing than SureSelect.

                Comment


                • #23
                  Hi Bryan,

                  Thanks for the suggestion. The Rain Dance approach seems to really be the better approach to handle these biased regions, albeit on a smaller scale.

                  For example, it doesn't scale very well for say, 1000 genes like the SureSelect, or if you want the whole exome, for example.

                  Comment


                  • #24
                    Originally posted by bryan haffer View Post
                    If you want to be able to enrich repetitive regions without bias, look at RainDance Technologies. Using their RainStorm approach, you can design PCR primers to capture 99% or greater of your target regions. The technology will also provide better uniformity allowing for less sequencing than SureSelect.

                    www.raindancetech.com
                    In the interest of full disclosure. In the future please make your affiliations more clear.

                    Comment


                    • #25
                      NGSfan,
                      Our lab has been using a custom SureSelect library to capture and sequence the extended HLA region (~8Mb). We have found that we get great coverage (>40X) over regions with <60% GC content while we get very poor coverage of regions with >60% GC content. I don't think that these results are out of the ordinary. I just came across the manuscript below...

                      Despite the ever-increasing output of Illumina sequencing data, loci with extreme base compositions are often under-represented or absent. To evaluate sources of base-composition bias, we traced genomic sequences ranging from 6% to 90% GC through the process by quantitative PCR. We identified PCR du …


                      They have shown that the PCR steps in the library constuction process create a huge bias against regions of high GC content. They have also shown how to resolve this problem. Check it out...

                      DoubleA
                      Last edited by DoubleA; 03-23-2011, 10:08 AM.

                      Comment


                      • #26
                        DoubleA,
                        Thank you for your link, the manuscript is very interesting but it just refers to the Illumina Sureselect protocol.
                        Has somebody some experience for the Agilent sureselect protocol ? in particular for the enzyme Herculase II...Does this enzyme allow to restore the fragments with high GC percent (like the AccuPrime Taq HiFi used in the manuscript) ?
                        Sam64

                        Comment


                        • #27
                          Hello everyone,

                          I recently tried three different polymerases as well as different PCR conditions (increases in denaturation time) in an attempt to increase read coverage in regions with >60% GC content. I enriched 12 Illumina PE libraries with either AccuPrime, Phusion, or KAPA polymerase (4 with each polymerase). I should mention that I ligated the same adapter and incorporated a unique bar code for each library using the three primer enrichment approach (PE 1.0, PE 2.0, and a primer containing the bar code). Following library production, I pooled 4 libraries (all 4 created with the same polymerase) and performed a hybridization with a custom SureSelect bait library covering ~8Mb of the HLA region on human chromosome 6 (baits on ~3.8 Mb). Following hybridization and elution, I performed a final enrichment with primers covering the last 20bp of the 5' and 3' end of the Illumina libraries. All 3 pools (12 libraries) were mixed and run on a HiSeq lane for a single 40bp run. The coverage was of the target region was a bit low (10-50X) so we'll probably sequence these libraries in the future with a PE X 100bp run. I have attached graphs of read coverage vs %GC content (20bp window). As you can see, the coverage of the the GC rich regions is pretty similar for each polymerase and the increase in denaturation time per cycle did not help much either. Below is the % reads mapping to regions with GC content >60%. I thought some of you might be interested in these results.

                          Regards,
                          Double A

                          AccuPrime 15 second/cycle denaturation: 12.4%
                          AccuPrime 30 second/cycle denaturation: 12.9%
                          AccuPrime 45 second/cycle denaturation: 13.2%
                          AccuPrime 60 second/cycle denaturation: 14.4%

                          Phusion 15 second denaturation/cycle: 13.5%
                          Phusion 30 second denaturation/cycle: 16.4%
                          Phusion 45 second denaturation/cycle: 13.1%
                          Phusion 60 second denaturation/cycle: 12.6%

                          KAPA 15 second denaturation/cycle: 13.9%
                          KAPA 15 second denaturation/cycle: 10.7%
                          KAPA 15 second denaturation/cycle: 13.5%
                          KAPA 15 second denaturation/cycle: 11.6%
                          Attached Files

                          Comment


                          • #28
                            coverage calc

                            Hey

                            I am trying to sequence the exome and the capture kit is 100MB

                            The sequencing core promised 120 million reads per lane and we are using paired end 100bp reads and our fragment size is 250 basepairs.

                            My calculation was I will get 120 million reads * 200= 240 million bases read

                            so coverage= 240 million bases/100MB= 240x coverage (average)

                            But some people say I will get a coverage of only 120x. What could be the reason? Or is the coverage actually 240x?

                            Comment


                            • #29
                              Hi Arvi8689,

                              There are two things that will reduce your fold coverage with a exome capture experiment. First, at least 10% of your reads will be PCR duplicates and should be removed before alignment. Second, ~60-80% of your unique reads will be "on target". Therefore, it's likely that only 50% of your initial reads will be unique and map to your target.

                              Double A

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Advancing Precision Medicine for Rare Diseases in Children
                                by seqadmin




                                Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                                12-16-2024, 07:57 AM
                              • seqadmin
                                Recent Advances in Sequencing Technologies
                                by seqadmin



                                Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                                Long-Read Sequencing
                                Long-read sequencing has seen remarkable advancements,...
                                12-02-2024, 01:49 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 12-17-2024, 10:28 AM
                              0 responses
                              32 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-13-2024, 08:24 AM
                              0 responses
                              48 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-12-2024, 07:41 AM
                              0 responses
                              34 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-11-2024, 07:45 AM
                              0 responses
                              46 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X