Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Very high depth of coverage

    I have done Illumina GAII sequencing that involved tiled long-range PCR products over a 200kb region of genomic DNA.
    Even with multiplexing within lanes, the output of sequencing gives me an average of 1500X coverage of the region per individual (some regions up to 3000X).
    What would be the best tool to do alignment and accurately call variants with this type of coverage?
    I have used CLC Genomics Workbench, and alignment is OK, but during SNP calling many apparent false positive variants are detected (for example, in a 1000X coverage region 950 A calls, 50 C calls). 50 calls seems like a lot to be error, but independent data (SNP genotyping and Sanger sequencing) call the region homozygous.
    Are there programs better equipped for this type of very deep coverage? Thanks.

  • #2
    It is of course a statistical problem. What if you adjust your read coverage (not your proportions) to a lower perhaps even consistent level - essentially taking read coverage out of the equation. Just speculating here.

    Comment


    • #3
      Have you tried filtering your sequences for duplicates? We find this essential when dealing with long range pcr libraries. We've used bwa/picard/samtools and the FASTX toolkit/CLC bio with success.

      Comment


      • #4
        Originally posted by dwmohr View Post
        Have you tried filtering your sequences for duplicates? We find this essential when dealing with long range pcr libraries. We've used bwa/picard/samtools and the FASTX toolkit/CLC bio with success.
        How do you identify duplicates if you expect at least two reads to have the same starting position? Even when you enforce both ends must have the same starting position with >1500X coverage you would expect to have two reads have both ends have the same starting position.

        Anybody have any other ideas to identify PCR duplicates on high coverage data? I don't think it is possible.

        Comment


        • #5
          Originally posted by nilshomer View Post
          Anybody have any other ideas to identify PCR duplicates on high coverage data? I don't think it is possible.
          I suppose you'd have to take an observed/expected approach. If you know the number and size distribution of your sequences you can work out the likelyhood of exact ovelaps of different depths (assuming reads are randomly distributed). Anything falling too far from the expected range would be suspicious.

          You could also maybe look at the ratio of exact overlaps to non-exact overlaps. If you have a region composed mostly of exact overlaps then that's not right for a randomly fragmented library. This should work even with unevenly distributed reads.

          Neither of these are going to detect small PCR effects, but normally we'd expect that when the PCR goes wrong it often goes very wrong - and those are the problems we're more interested in sorting out.

          Comment


          • #6
            Originally posted by simonandrews View Post
            I suppose you'd have to take an observed/expected approach. If you know the number and size distribution of your sequences you can work out the likelyhood of exact ovelaps of different depths (assuming reads are randomly distributed). Anything falling too far from the expected range would be suspicious.

            You could also maybe look at the ratio of exact overlaps to non-exact overlaps. If you have a region composed mostly of exact overlaps then that's not right for a randomly fragmented library. This should work even with unevenly distributed reads.

            Neither of these are going to detect small PCR effects, but normally we'd expect that when the PCR goes wrong it often goes very wrong - and those are the problems we're more interested in sorting out.
            That should work. I am also thinking about clonal reads for SOLiD data. In this case, it wont be as bad as when things go wrong with PCR in prep.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin


              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
              Yesterday, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            39 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            41 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            35 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            55 views
            0 likes
            Last Post seqadmin  
            Working...
            X