Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • overlapping paired-end reads vs single end reads

    Dear all,

    I am doing 2x100 paired-end (PE) exome sequencing on an Illumina machine with the Nextera rapid capture exome capture protocol. Since over 80% of the target Nextera target exons are shorter then 200bp, I get many overlapping PE reads.

    I now wonder if doing 2x100 bp is a good approach in this case. I realize that it depends on how you treat the overlapping PE reads. I invite you to contribute to this discussion here:
    http://seqanswers.com/forums/showthread.php?t=61369

    An obvious solution might be to do shorter reads, i.e. 2x50 bp, however the kit to do 100bp (2x50) cost almost as much as the kit to do 200bp (2x100), so there is little gain.

    Might it actually be better to do SE sequencing here to avoid overlapping, to avoid sequencing the same DNA fragment twice? Disadvantages of SE sequencing are less accurate duplicate recognition, though this problem occurs just with very high coverage and my coverage is rather moderate (40-60X). The advantage is that I will have more independent reads, i.e. more data.

    I'd appreciate your thoughts. Thank you.

  • #2
    As I mentioned in your other thread, most variant callers are aware of the concept of paired-end reads and will deal with them in a coherent manner. You're unlikely to gain anything by merging overlapping regions for your use case.

    Regarding 200bp SE with reads vs. 100bp PE sequencing, the nicest results will probably vary by facility. For our internally produced data I think the 100bp PE reads might produce slightly better results, but I suppose one would need to do the comparison to really say for sure.

    Comment


    • #3
      Originally posted by dpryan View Post
      most variant callers are aware of the concept of paired-end reads and will deal with them in a coherent manner.
      I see your point, thanks for sharing. Still, I wonder if the extra accuracy that I gain by the overlapping reads is worth the additional sequencing. But that is probably something that everybody has to decide for themselves.

      Comment


      • #4
        Originally posted by evakoe View Post
        I see your point, thanks for sharing. Still, I wonder if the extra accuracy that I gain by the overlapping reads is worth the additional sequencing. But that is probably something that everybody has to decide for themselves.
        The price differential is small between SE and PE reads (based on a casual search for exome sequencing for a single sample on GenoHub.com). If you were doing thousands of samples then it may become a consideration but then you would be negotiating directly with the sequencing center.

        Comment


        • #5
          Originally posted by GenoMax View Post
          The price differential is small between SE and PE reads
          Ok, let's assume that the price for SE and PE for a given read length and number of reads is identical. My point was that I might get more data out of SE, since I don't "loose" bases to the overlapping PE. As a consequence, PE is more expensive to get to the same final coverage.

          Comment


          • #6
            While you lose a few bases with PE data you also probably have a slightly higher alignment rate, so the actual effective coverage is probably not that different.

            Comment


            • #7
              Originally posted by dpryan View Post
              While you lose a few bases with PE data you also probably have a slightly higher alignment rate, so the actual effective coverage is probably not that different.
              That would be very easy to test. I can just remove the paired end information and treat my reads as SE and compare the alignment accuracy. I will do that and post the results.

              Unfortunately, the overlap is much more than a few bases. Imagine the PE reads with adapters already removed. When I now merge the overlapping reads and then count the number of bases, I have 30% less bases than without the merging.

              Comment


              • #8
                Given that, your SE reads would be heavily contaminated with adapters and similar junk, so again I doubt you'd be gaining anything with SE reads.

                Comment


                • #9
                  Originally posted by dpryan View Post
                  Given that, your SE reads would be heavily contaminated with adapters and similar junk, so again I doubt you'd be gaining anything with SE reads.
                  But would SE reads be more heavily contaminated with adapters then PE reads? With the PE data, 11% of the bases from the raw reads are adapters, though there is quite some variability per sample.

                  Comment


                  • #10
                    Yes, they'd be even more contaminated than PE samples, since you're likely starting with essentially the same fragment size pool.

                    Comment


                    • #11
                      Originally posted by dpryan View Post
                      Yes, they'd be even more contaminated than PE samples, since you're likely starting with essentially the same fragment size pool.
                      I didn't know this. Thank you for your input.

                      Comment


                      • #12
                        Originally posted by evakoe View Post
                        That would be very easy to test. I can just remove the paired end information and treat my reads as SE and compare the alignment accuracy. I will do that and post the results.

                        Unfortunately, the overlap is much more than a few bases. Imagine the PE reads with adapters already removed. When I now merge the overlapping reads and then count the number of bases, I have 30% less bases than without the merging.
                        PE reads are better than SE in every way, if you want to align and call variations. Not only will they have a marginally higher alignment rate (and potentially noticeably higher in areas with more significant mutations), they will be substantially more accurate. So you increase true-positives and decrease false-positives... at the same time that you gain a substantially improved ability to detect structural variations, and to trim adapters (which works much better with PE reads). Furthermore, if you decide to do duplicate removal (which is a good idea for an amplified library), PE libraries will have a much lower duplicate-removal rate because it's possible to measure the insert size of a pair, and thus determine more accurately whether it is a duplicate or just a fragment that happens to start at the same location.

                        If you worry about wasting bases in the overlap (which is not really a waste), then just aim for longer fragments. Exon capture does not capture fragments matching the bounds of exons; it enriches for any fragments that hybridize to baits that are designed to contain sequence of or around the exon targets. That means if you have one or more baits designed to cover a 200bp exon, and a 500bp fragment that contains the exon, they can still hybridize. I don't know about that specific kit, but all the exon-capture data I've seen - from several different kits - had coverage extending well out into the introns.

                        Comment


                        • #13
                          Originally posted by Brian Bushnell View Post
                          PE reads are better than SE in every way, if you want to align and call variations. Not only will they have a marginally higher alignment rate (and potentially noticeably higher in areas with more significant mutations), they will be substantially more accurate. So you increase true-positives and decrease false-positives... at the same time that you gain a substantially improved ability to detect structural variations, and to trim adapters (which works much better with PE reads). Furthermore, if you decide to do duplicate removal (which is a good idea for an amplified library), PE libraries will have a much lower duplicate-removal rate because it's possible to measure the insert size of a pair, and thus determine more accurately whether it is a duplicate or just a fragment that happens to start at the same location.
                          A quick test of treating my PE data as SE did not show a decrease in alignment efficiency, but a slight increase in the general error rate, the number of mismatches and indels. But I agree that PE is more accurate in general, I think enough publications have shown this.

                          Originally posted by Brian Bushnell View Post
                          If you worry about wasting bases in the overlap (which is not really a waste), then just aim for longer fragments.
                          I also arrived at this conclusion and I am already discussing with the wet lab people on how we can implement this. But I am glad to hear that you don't consider the overlap a waste, maybe I did not appreciate the increase in accuracy enough. Thank you.

                          Comment


                          • #14
                            Originally posted by evakoe View Post
                            A quick test of treating my PE data as SE did not show a decrease in alignment efficiency, but a slight increase in the general error rate, the number of mismatches and indels. But I agree that PE is more accurate in general, I think enough publications have shown this.
                            I should have mentioned that this is aligner-specific - PE reads will only map at a higher rate than SE reads if you use an aligner with a "rescue" operation, which uses a mapped read as an anchor to look for a mapping location of the unaligned mate, which did not initially align due to a high error rate or major mutations; or, aligners that allow lower-scoring mappings for properly-paired reads. Aligners that internally do not do these things will generally have identical mapping rates of PE reads when you treat them as SE.

                            Comment


                            • #15
                              Originally posted by Brian Bushnell View Post
                              If you worry about wasting bases in the overlap (which is not really a waste), then just aim for longer fragments.
                              Aiming for longer fragments increases the number of reads/bases that are off-target. Likely there is an optimal fragment length, but I don't think that our lap could produce these reliably anyway.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM
                              • seqadmin
                                The Impact of AI in Genomic Medicine
                                by seqadmin



                                Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                                02-26-2024, 02:07 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-14-2024, 06:13 AM
                              0 responses
                              34 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-08-2024, 08:03 AM
                              0 responses
                              72 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-07-2024, 08:13 AM
                              0 responses
                              81 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-06-2024, 09:51 AM
                              0 responses
                              68 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X