SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Ion Torrent



Similar Threads
Thread Thread Starter Forum Replies Last Post
Newbie Question: How to count genome region coverage and CpG sites coverage? illuminaGA Bioinformatics 0 09-25-2015 07:25 AM
Depth of Coverage Viberance Bioinformatics 4 03-13-2015 12:36 PM
Newbie Question: Calculating physical coverage from genome coverage tristanstoeber Illumina/Solexa 4 06-24-2013 10:53 AM
Comparing Samples with 20Mbp coverage to 40MBP coverage sgoswami RNA Sequencing 0 03-09-2012 07:41 AM
low 454 coverage combined with high solexa coverage strob Bioinformatics 7 10-07-2010 10:14 AM

Reply
 
Thread Tools
Old 10-18-2015, 06:26 PM   #1
hanco
Junior Member
 
Location: Japan

Join Date: Jun 2015
Posts: 8
Default Difference between coverage, depth of coverage and targeted coverage

Hi everybody,

I have a question about coverage, depth of coverage and targeted coverage.
What is the difference between those three?

If the targeted coverage, for example, is X25000, does it mean that the amount of reads is 25,000 reads?
And which one is better or more specific, X1000 coverage or X25000?

I am sorry if I perhaps ask a stupid question, please not that I am very new at this and my background is not bioinformatics.

Regards,
Hanco
hanco is offline   Reply With Quote
Old 10-19-2015, 12:00 AM   #2
Bukowski
Senior Member
 
Location: UK

Join Date: Jan 2010
Posts: 390
Default

Targeted coverage would normally imply the coverage you were aiming for with an experiment - i.e. in a perfect experiment with perfect pooling you would achieve this coverage. However you don't get perfect experiments, so you may not achieve it.

Coverage and depth of coverage mean the same thing.

Your coverage is not only related to the number of reads, but the size of the thing you are trying to sequence.

If you have a 100bp region covered by 1 100base read, then the coverage is 1x.
If you have a 200bp region covered by 1 100base read, then the coverage is 0.5x

More coverage is better for most applications, with the possible exception of genome assembly.

Last edited by Bukowski; 10-19-2015 at 12:25 AM.
Bukowski is offline   Reply With Quote
Old 10-19-2015, 01:14 AM   #3
hanco
Junior Member
 
Location: Japan

Join Date: Jun 2015
Posts: 8
Default

Thank you so much for your clear explanation.

I have another question.
So can I decide whether I have a good data based on coverage?
For example, I have 200bp region covered by 30 200base read, that mean I have coverage 30X. How can I know that this 30X coverage is good or not?

Again, thank you for your response, I really appreciate it.
hanco is offline   Reply With Quote
Old 10-19-2015, 01:24 AM   #4
Bukowski
Senior Member
 
Location: UK

Join Date: Jan 2010
Posts: 390
Default

I assume this relates to variant calling?

Firstly each base of each read will have a base quality score.
Each read will have a mapping score.

Whether something is 'good or not' is dependent on the application. Calling variants in a region where your read mapping qualities are low, or your base quality scores are low, is more likely to result in false positives.

However if you're using the data for genotyping, each variant will have a quality score which will take all this information into account.

Required coverage is driven by application. 30x is fine for calling germline SNPs in diploid model organisms. >1000x is fine for calling somatic SNPs with a variant allele frequency of 5% - however you can't call somatic variants of that nature with 30x coverage (not enough data).

Coverage is only *one* metric that you need to use when assessing the quality of your dataset.
Bukowski is offline   Reply With Quote
Old 10-19-2015, 02:22 AM   #5
hanco
Junior Member
 
Location: Japan

Join Date: Jun 2015
Posts: 8
Default

Yes, actually my question relates to variant calling.

I am using Ion PGM Ampliseq Cancer Panel Hotspot v2 and Torrent Variant Caller.
Usually I have only few number of coverage, like I mentioned before, 30 200base reads. Sometimes this kind of data will not appear in TVC. I was wondering if the coverage of my data is too low or do I set the stringency too high.

I always set the variant frequency to "somatic" but I am little confused because in the user guide said that for somatic workflows the threshold is set to 4% frequency for SNPs and 20% for indels. But as you can see in the picture, the minimum variant allele is 0.02 (is it equal to 20%?).



I am sorry if my questions is out of the original topic.
Your help is much appreciated, thank you.
hanco is offline   Reply With Quote
Old 10-19-2015, 04:18 AM   #6
Bukowski
Senior Member
 
Location: UK

Join Date: Jan 2010
Posts: 390
Default

I think 0.02 refers to 2% in this context, not 20%

The min_coverage stated will not show variants at less than 100x (I assume, I've never used TVC)
Bukowski is offline   Reply With Quote
Old 10-19-2015, 04:36 PM   #7
hanco
Junior Member
 
Location: Japan

Join Date: Jun 2015
Posts: 8
Default

Yes, I thought so too. That is why I get so confused. These number all automatically set when I chose "somatic" mode.
I suppose I should make another thread to discuss about this.

Anyway, thank you so much for all your kind help and clear explanations. Really appreciate it.
hanco is offline   Reply With Quote
Reply

Tags
coverage, depth of coverage, targeted coverage

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:02 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO