Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Coverage "standards" for SNP detection in tumor samples

    Dear all,

    I was wondering if there is a standard "coverage" for exomic SNP calling in tumor_vs_healthy samples (same patient). As we know, tumor samples have an intrinsically higher mutability (Parsons et al., 1993). I was thinking of applying a threshold of at least 20X for the healthy one, and 50X for the tumor one. Do these look sufficient to you?

    Also, there appears to be no standard for coverage definition: so by "50X" I intend exome-wise coverage of 100bp uniquely-mapping Illumina paired reads, after duplicate removal.

    Thanks!

    Federico

  • #2
    Originally posted by giorgifm View Post
    Dear all,

    I was wondering if there is a standard "coverage" for exomic SNP calling in tumor_vs_healthy samples (same patient). As we know, tumor samples have an intrinsically higher mutability (Parsons et al., 1993). I was thinking of applying a threshold of at least 20X for the healthy one, and 50X for the tumor one. Do these look sufficient to you?

    Also, there appears to be no standard for coverage definition: so by "50X" I intend exome-wise coverage of 100bp uniquely-mapping Illumina paired reads, after duplicate removal.

    Thanks!

    Federico
    No there is no standard. It depends how many calls you want to make accurately. Something like SomaticSniper will happily call things in low coverage areas, but you will have little confidence in the genotypes. Even with 40x coverage for an exome sample.

    I'm doing some development work on cancer panels, and we've been advised (this is not exome sequencing, but targetted resequencing) to be aiming for 500x to 1000x coverage. I was a little iffy about these figures until I started actually doing the analysis on exomes myself just to test things out.

    This is prohibitively expensive for exomes I imagine, so I think in terms of depth 'as much as you can afford'. Remember you will also want to be confident about the genotype calls in your normal samples..

    Comment


    • #3
      Thank you for your answer Bukowski. So far we are aiming at around 40x coverage. That seems to be the minimum coverage to stabilize the significance of somatic mutations found.

      Comment


      • #4
        finding the depth of coverage with more confidence

        This is an extension to the original question on this post. I was wondering if anybody knows how I can calculate the accuracy in sequencing at various levels of depth of coverage. based on this I want to choose the coverage with more confidence. thanks in advance to all.

        Comment


        • #5
          Originally posted by rama View Post
          This is an extension to the original question on this post. I was wondering if anybody knows how I can calculate the accuracy in sequencing at various levels of depth of coverage. based on this I want to choose the coverage with more confidence. thanks in advance to all.
          A couple ideas: http://genome.sph.umich.edu/wiki/SNP...Set_Properties

          Also, for any metric, you can tentatively assume your higher coverage/higher quality score calls will be more "correct" than the lower coverage/lower quality score calls. Thus, for any metric, compare different coverage thresholds to your highest quality sets. One caveat is it's possible for mapping artifacts or other things to lead to super high coverage, so make sure your "high quality set" looks real.

          Comment


          • #6
            Thanks a bunch for the pointer.
            once we identify the data with "high quality set" is there a way to compute metrics at different coverage thresholds. I am not sure how to do it, do I have to randomly subset sequence reads and check for the variant calls or just compare with the consensus?

            Comment


            • #7
              Global Alliance White Paper on Clinical Data

              There is a consortium on clinical data as described in the White Paper linked here:



              On page 30 there is listed the names of organizers and their institutions, where you may be able to obtain additional follow-up information to "standards" questions about clinical data at this time.

              Please contribute your posts on any standards statements that you may obtain therefrom here at this forum and/or in the Wiki so that others may be kept informed thus enabling a more rapid dissemination of consensus parameters.

              Comment


              • #8
                Originally posted by rama View Post
                Thanks a bunch for the pointer.
                once we identify the data with "high quality set" is there a way to compute metrics at different coverage thresholds. I am not sure how to do it, do I have to randomly subset sequence reads and check for the variant calls or just compare with the consensus?
                I was thinking just separate calls by coverage. IE, make a set of calls at >100x coverage, a set at 90-100x, a set at 80-90x, etc, and compare them. Or use quality score instead of coverage if you like that metric better. Your idea is interesting though; you could take a set of high quality calls and then randomly take smaller and smaller sets of reads for the same positions, redo the calling, and see how low the coverage threshold can get until your "subset calls" deviate too much from the legitimate set. The problem is if your high quality calls are in "easy" sites then this strategy won't apply to the rest of the genome necessarily.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM
                • seqadmin
                  The Impact of AI in Genomic Medicine
                  by seqadmin



                  Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                  02-26-2024, 02:07 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 03-14-2024, 06:13 AM
                0 responses
                34 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-08-2024, 08:03 AM
                0 responses
                72 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-07-2024, 08:13 AM
                0 responses
                81 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-06-2024, 09:51 AM
                0 responses
                68 views
                0 likes
                Last Post seqadmin  
                Working...
                X