SEQanswers

Go Back   SEQanswers > Applications Forums > Clinical Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
compare RNASeq with Exon array data at the gene-level and exon-level shirley0818 RNA Sequencing 0 07-22-2014 12:47 PM
Comparing Kmer distribution between samples jgibbons1 Genomic Resequencing 2 05-01-2014 12:13 PM
RNA-seq read depths: observed vs. expected rnastar Bioinformatics 4 08-29-2013 10:45 AM
Comparing FPKM across samples jk1124 RNA Sequencing 1 11-14-2012 05:12 PM
Marking duplicates and comparing 2 WE samples? angerusso Bioinformatics 0 02-19-2012 06:04 PM

Reply
 
Thread Tools
Old 09-09-2014, 08:37 PM   #1
shimbalama
bioinformatics-help.com
 
Location: Adelaide

Join Date: Jul 2014
Posts: 9
Unhappy Comparing read depths per gene/exon between samples

HI all, this is my first post so apologies if its inappropriate - its a pretty simple question but I need a little help. I promise I've googled far and wide to try and figure it out myself.

I use short reads from a Miseq to make clinical variant calls using GATK. I use various panels (trusight cancer etc). Some exons/genes always have low coverage (due to GC content etc) and others just fail in one sample, which is often clinically relevant.

I would like to compare the mean coverage of each exon/gene in each sample to the same from a 'gold standard' derived of what my lab scientist tell me is a 'good run'. Currently I am doing a ttest with the mean of the gold compared to the read depths at each base in the exon/gene that I am doing variant calling on. Basically, I only want to know if the mean read depth is low if it is significantly different to the mean of the gold.

It made sense at first because I am comparing 2 means. Is that right? It seems wrong because I'm really only comparing two samples. So I though I should do a Z test...

Has anyone done anything similar? How did you implement it?
shimbalama is offline   Reply With Quote
Old 09-09-2014, 09:17 PM   #2
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,695
Default

For human data, I suggest calling mutations against the standard human genome, then comparing them against known databases, such as the human 1000 genomes project or other databases.

There are gold standards, but gold is a relative and dynamic term in any advancing industry. Particularly, exon-capture is not at all replicable between different platforms.
Brian Bushnell is offline   Reply With Quote
Old 09-09-2014, 10:37 PM   #3
bt27uk
Junior Member
 
Location: Denmark

Join Date: Aug 2011
Posts: 7
Default

I think you are asking about coverage, where the first reply seemed to be talking about something a bit different.

I wonder if it's not so much a statistical comparison you are after here, but rather a cutoff level. In this case, the challenge becomes what regions to measure and how to set the cutoffs for those regions.

From your past experience, does the mean depth tell you what you need to know? If you are working with panels, then perhaps it would be relevant to choose a few regions where you know the coverage range you would consider normal or good and check whether the coverage from a given run is at that level?

How to set cutoffs, which would act as the warnings that a sample may not be of the quality you need, could involve, for example, basic exploratory data analysis, such as tables and plots of the coverage of your gold sample and looking at the distribution of coverage over the mapping, (or over the regions you work with). From this, determine values that would be meaningful to check for in your samples. I would likely test the test you come up with by running against other samples you know were considered good or bad in the past, to see if your tests would have flagged up the samples you hope it will.

Having said all that, my suspicion is that this question may be a solved problem and that others in the forum will have more mature ideas about processes and tools to use for this purpose.

Guess we'll find out, right? :-)
bt27uk is offline   Reply With Quote
Old 09-09-2014, 10:41 PM   #4
shimbalama
bioinformatics-help.com
 
Location: Adelaide

Join Date: Jul 2014
Posts: 9
Default

Thanks Brian.

I do all that. What I am trying to do is QC on the negative var calls. So every base in every gene of interest (GOI).

What I am interested in is the mean read depth of every GOI that comes off my machine and whether it is significantly different to the mean depth I have defined as 'gold'. So the question is about statistical analysis only.
shimbalama is offline   Reply With Quote
Old 09-09-2014, 10:46 PM   #5
shimbalama
bioinformatics-help.com
 
Location: Adelaide

Join Date: Jul 2014
Posts: 9
Default

Thanks bt27uk,

Much more on point.

I have implemented an approach similar to what you suggest, ie. if sample mean < 20x but gold isn't we want to know. My boss wants a P value though.

Cheers,
Liam
shimbalama is offline   Reply With Quote
Old 09-10-2014, 08:17 AM   #6
bt27uk
Junior Member
 
Location: Denmark

Join Date: Aug 2011
Posts: 7
Default

If your supervisor wants a p-value, then I have likely missed the point.

I originally assumed the aim was to ask a question like "does this sample have adequate coverage for my purposes?”. For the purpose of noting samples that might not have adequate coverage for downstream analysis, I think a set of coverage cutoffs for the various genes of interest, based on some lower limit you determine based on your knowledge of a “good” sample, would be a reasonable way forward.

To me, a p-value suggests questions more long the line of "does this sample have (any, some, all?) genes that have coverage that fall outside a range that constitutes the population of what are considered good samples?" That is a rather more complex question to approach.
bt27uk is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:45 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO