![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
basic question about read groups | efoss | Bioinformatics | 2 | 10-19-2011 04:32 PM |
A basic base pair (dna) question | ardmore | General | 12 | 08-24-2011 09:10 AM |
depth of coverage basic question | madsaan | Bioinformatics | 0 | 03-24-2011 06:40 AM |
tophat - basic question - mapped reads | chrisbala | Bioinformatics | 0 | 04-06-2010 09:36 AM |
tophat - basic question - mapped reads | chrisbala | Illumina/Solexa | 3 | 02-26-2010 08:38 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Germany Join Date: Jan 2012
Posts: 17
|
![]()
Hi everybody,
I have a basic question in NGS area how can we calculate sequencing coverage (5X, 20X ...) at selected regions of interest? and what does it exactly mean? It is calculated after sequencing and based on fastq file or after mapping to the genome? |
![]() |
![]() |
![]() |
#2 |
Member
Location: Stockholm Join Date: Nov 2011
Posts: 10
|
![]()
Hello,
you need to map the reads first to know from what region they (hopefully) comes from. One easy way to look for coverage in regions is to design a .bed file (http://genome.ucsc.edu/FAQ/FAQformat.html#format1) with your regions of interest and compare them to the mapped result with bedtools coverageBed (http://code.google.com/p/bedtools/). |
![]() |
![]() |
![]() |
#3 |
Senior Member
Location: Kansas City Join Date: Mar 2008
Posts: 197
|
![]()
Once you have coverage in terms of read count (from coverageBed), to get coverage like 5x, you'll have to do
( read count * read length ) / length of area in question So if you have 5 reads that are 50 bases long in a region that's 100 bases long, your coverage will be (5 * 50) / 100 = 2.5x. You could calculate a number for the whole genome by adding all the chromosome lengths, or you could do individual chromosomes or genes or windows throughout the genome or whatever is interesting. |
![]() |
![]() |
![]() |
#4 |
Member
Location: Wisconsin Join Date: Jun 2011
Posts: 87
|
![]()
You could also use "samtools depth" to find the coverage at each position in your target region which like mamons said, you could create a bed file for that.
Then you can just take all the coverage counts and get a summary using R to get the mean, median etc. depth of coverage in the region of interest |
![]() |
![]() |
![]() |
#5 |
Junior Member
Location: Saarbruecken, Germany Join Date: Jan 2011
Posts: 9
|
![]()
Hi,
has someone of you ever tried to calculate the coverage (using coverageBed) of a whole genome sequencing experiment (about 40x average coverage) on a relatively large amount of genomic features (like all refSeq genes). I tried to perform this task on a multicore processor and 16 GB Ram memory. After three days of calculation and a constant memory consumption of about 14 GB i stopped the process. I used the following command: ./coverageBed -abam <bam_file> -b <refSeq_exons.bed6> -hist >> histogram.txt Is that normal? Some ideas? @aggp11: Which version of samtools are you using? The command depth seems not to be present in my version |
![]() |
![]() |
![]() |
#6 |
Member
Location: Germany Join Date: Jan 2012
Posts: 17
|
![]()
Hi
http://sourceforge.net/mailarchive/f...samtools-devel * Added the `depth' command to samtools to compute the per-base depth with a simpler interface. File `bam2depth.c', which implements this command, is the recommended example on how to use the mpileup APIs. |
![]() |
![]() |
![]() |
#7 |
Member
Location: Wisconsin Join Date: Jun 2011
Posts: 87
|
![]()
@Mbender: I am using samtools version 0.1.18 .
@maria_maria & Mbender: with this latest version of samtools, you don't even have to worry about bam2depth. The following command is an example of how samtools depth works: samtools depth -q 30 -b exons.bed exome.bam > test_q_20.coverage Output: chr1 14468 39 chr1 14469 39 chr1 14470 37 chr1 14471 39 chr1 14472 35 chr1 14473 34 Where the third column is the # of q30 or more reads at the given position. Thanks, Praful |
![]() |
![]() |
![]() |
#8 |
Junior Member
Location: Saarbruecken, Germany Join Date: Jan 2011
Posts: 9
|
![]()
Many thanks.
Using samtools depth seems to calculate the coverage in given genomic regions in a feasible amount of time. By the way, the low performance of coverageBed when working on a large amount of genomic intervals is a known issue. http://groups.google.com/group/bedto...c7a5f40b28b5ef Best, Matthias |
![]() |
![]() |
![]() |
Thread Tools | |
|
|