SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to calculate coverage arendon Bioinformatics 53 08-20-2015 07:23 AM
How do I calculate the average coverage of genome? prs321 Bioinformatics 1 06-02-2014 06:48 PM
How do I calculate the coverage? prs321 Bioinformatics 3 04-10-2014 11:14 AM
Calculate Exome Coverage Coryza Bioinformatics 4 04-08-2014 07:00 AM
How to calculate coverage for transcripts? hugomarquez Bioinformatics 0 11-07-2013 06:21 PM

Reply
 
Thread Tools
Old 06-18-2015, 01:00 AM   #1
LeonDK
Member
 
Location: Denmark

Join Date: Sep 2014
Posts: 69
Default How would you calculate the coverage for RNAseq?

I am thinking that the most important quantification parameter must be how many reads are mapped within annotated regions?

Hence it is not mapped reads per whole genome, but rather mapped reads per annotated region, like so:
coverage = n_mapped_reads * read_length / (n_annotated_regions * average_length_of_annotated_regions)

This way we get average number of times each annotated nucleotide was sequenced?

L

Last edited by LeonDK; 06-18-2015 at 01:18 AM.
LeonDK is offline   Reply With Quote
Old 06-18-2015, 01:14 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,054
Default

So if a region is not annotated you are going to consider reads mapping there not important/real?
GenoMax is offline   Reply With Quote
Old 06-18-2015, 01:17 AM   #3
LeonDK
Member
 
Location: Denmark

Join Date: Sep 2014
Posts: 69
Default

Quote:
Originally Posted by GenoMax View Post
So if a region is not annotated you are going to consider reads mapping there not important/real?
Hmm... No and yes... Of course if something maps it (in theory) means that a transcript originated from there. But if I am to use the annotations for GRCh38 for analysis of DEG, than the reads mapping in non-annotated regions are not considered?
LeonDK is offline   Reply With Quote
Old 06-18-2015, 01:27 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,054
Default

If you are explicitly interested in the "annotated" regions then that is a limitation you are imposing. Reads mapping elsewhere will still be real.

Take a look at these: http://bedtools.readthedocs.org/en/l.../coverage.html http://bedtools.readthedocs.org/en/l...genomecov.html
GenoMax is offline   Reply With Quote
Old 06-18-2015, 08:11 AM   #5
jwfoley
Senior Member
 
Location: Stanford

Join Date: Jun 2009
Posts: 181
Default

Why do you want to calculate coverage in the first place? If you're using it for gene-expression profiling, RNA-seq is a read-counting application, not a base-counting application, so for most purposes the read count is a more appropriate measure of sequencing depth.
jwfoley is offline   Reply With Quote
Old 06-18-2015, 08:55 AM   #6
LeonDK
Member
 
Location: Denmark

Join Date: Sep 2014
Posts: 69
Default

Quote:
Originally Posted by jwfoley View Post
Why do you want to calculate coverage in the first place? If you're using it for gene-expression profiling, RNA-seq is a read-counting application, not a base-counting application, so for most purposes the read count is a more appropriate measure of sequencing depth.
Good point!

So in your opinion, it is sufficient to report the number of mapped reads? E.g. "we included samples with at least 10 mio. mapped reads" or something similar?
LeonDK is offline   Reply With Quote
Old 06-18-2015, 08:59 AM   #7
jwfoley
Senior Member
 
Location: Stanford

Join Date: Jun 2009
Posts: 181
Default

Quote:
Originally Posted by LeonDK View Post
Good point!

So in your opinion, it is sufficient to report the number of mapped reads? E.g. "we included samples with at least 10 mio. mapped reads" or something similar?
Yes, that is the single number that wraps it up. For reference, ENCODE's old standard for gene-expression profiling was 30 million paired-end reads (for human samples), but that's not the greatest recommendation anyway.
jwfoley is offline   Reply With Quote
Old 06-18-2015, 09:12 AM   #8
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,054
Default

Quote:
Originally Posted by LeonDK View Post
So in your opinion, it is sufficient to report the number of mapped reads? E.g. "we included samples with at least 10 mio. mapped reads" or something similar?
What kind of "report" are we referring to here? How do you decide on a number for cut-off? Is the distribution of mapped reads assured to be equally representative for all samples, if you are only going to report a total number for mapped reads? And so on ...

Last edited by GenoMax; 06-18-2015 at 09:15 AM.
GenoMax is offline   Reply With Quote
Old 06-18-2015, 09:25 AM   #9
LeonDK
Member
 
Location: Denmark

Join Date: Sep 2014
Posts: 69
Default

Ok, so of course I know very well, that there is no magic number. In fact pretty much every cut-off we use is arbitrary, especially the infamous 0.05 - Anyhoo...

For publication-purposes it is necessary to account for how 'bad' samples were excluded.

Hart et al. (10.1089/cmb.2012.0283) states that: "In other words, a sequencing depth of 10 million reads will ensure that approximately 90% of all genes will be covered by at least 10 reads."

That's why a suggested 10 mio. as a 'nice' arbitrary number
LeonDK is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:17 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO