Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Comparing read depths per gene/exon between samples

    HI all, this is my first post so apologies if its inappropriate - its a pretty simple question but I need a little help. I promise I've googled far and wide to try and figure it out myself.

    I use short reads from a Miseq to make clinical variant calls using GATK. I use various panels (trusight cancer etc). Some exons/genes always have low coverage (due to GC content etc) and others just fail in one sample, which is often clinically relevant.

    I would like to compare the mean coverage of each exon/gene in each sample to the same from a 'gold standard' derived of what my lab scientist tell me is a 'good run'. Currently I am doing a ttest with the mean of the gold compared to the read depths at each base in the exon/gene that I am doing variant calling on. Basically, I only want to know if the mean read depth is low if it is significantly different to the mean of the gold.

    It made sense at first because I am comparing 2 means. Is that right? It seems wrong because I'm really only comparing two samples. So I though I should do a Z test...

    Has anyone done anything similar? How did you implement it?
    LM

  • #2
    For human data, I suggest calling mutations against the standard human genome, then comparing them against known databases, such as the human 1000 genomes project or other databases.

    There are gold standards, but gold is a relative and dynamic term in any advancing industry. Particularly, exon-capture is not at all replicable between different platforms.

    Comment


    • #3
      I think you are asking about coverage, where the first reply seemed to be talking about something a bit different.

      I wonder if it's not so much a statistical comparison you are after here, but rather a cutoff level. In this case, the challenge becomes what regions to measure and how to set the cutoffs for those regions.

      From your past experience, does the mean depth tell you what you need to know? If you are working with panels, then perhaps it would be relevant to choose a few regions where you know the coverage range you would consider normal or good and check whether the coverage from a given run is at that level?

      How to set cutoffs, which would act as the warnings that a sample may not be of the quality you need, could involve, for example, basic exploratory data analysis, such as tables and plots of the coverage of your gold sample and looking at the distribution of coverage over the mapping, (or over the regions you work with). From this, determine values that would be meaningful to check for in your samples. I would likely test the test you come up with by running against other samples you know were considered good or bad in the past, to see if your tests would have flagged up the samples you hope it will.

      Having said all that, my suspicion is that this question may be a solved problem and that others in the forum will have more mature ideas about processes and tools to use for this purpose.

      Guess we'll find out, right? :-)

      Comment


      • #4
        Thanks Brian.

        I do all that. What I am trying to do is QC on the negative var calls. So every base in every gene of interest (GOI).

        What I am interested in is the mean read depth of every GOI that comes off my machine and whether it is significantly different to the mean depth I have defined as 'gold'. So the question is about statistical analysis only.
        LM

        Comment


        • #5
          Thanks bt27uk,

          Much more on point.

          I have implemented an approach similar to what you suggest, ie. if sample mean < 20x but gold isn't we want to know. My boss wants a P value though.

          Cheers,
          Liam
          LM

          Comment


          • #6
            If your supervisor wants a p-value, then I have likely missed the point.

            I originally assumed the aim was to ask a question like "does this sample have adequate coverage for my purposes?”. For the purpose of noting samples that might not have adequate coverage for downstream analysis, I think a set of coverage cutoffs for the various genes of interest, based on some lower limit you determine based on your knowledge of a “good” sample, would be a reasonable way forward.

            To me, a p-value suggests questions more long the line of "does this sample have (any, some, all?) genes that have coverage that fall outside a range that constitutes the population of what are considered good samples?" That is a rather more complex question to approach.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 06:37 PM
            0 responses
            10 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 06:07 PM
            0 responses
            9 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            49 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            67 views
            0 likes
            Last Post seqadmin  
            Working...
            X