For our paired-end exome sequence data, I have aligned the reads to HG18 reference genome with BWA, and called variants with SAMtools mpileup command. Could anyone teach me how to obtain the mapping coverage %, error rate, etc.? Thanks a lot.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
As with Heisman I prefer DepthOfCoverage especially for exomes because of the percent reads at X depth feature.
For you interval list it can tell you which percentage of your bases are higher than 5x, 10x, 30x,50x, etc (configurable). This helps greatly to see if you have sufficient overall base coverage >20x to make confident calls.
Comment
-
-
Originally posted by cedance View PostIsn't depth of coverage just the ratio of total number of reads sequenced to the number of bases in your genome?
Cov = Total reads sequenced/ no: of bases in genome (in fasta)?
When a tool compute average or median of an interval_list it gives you the "true", usable coverage.
Comment
-
When the alignment is written in Goby format (which you can produce with this version of BWA, or with GobyWeb), you can estimate coverage statistics for exome as described in this tutorial:
estimating-coverage-statistics-for-targeted-resequencing-experiments
Comment
-
Originally posted by lletourn View PostYes but it's a gross approximation. You lose the reads that are trimmed/clipped. the ones that don't align, the ones not on target the bases with too low quality, the reads that have a too-low mapping quality...you get the picture.
When a tool compute average or median of an interval_list it gives you the "true", usable coverage.
I tried DepthOfCoverage on my data but I got an error related to "RG" (read group). I work on alternative splicing and I merged the data set (by modifying the header so that I can get back the files). I guess I should just add a tag with RG as optional field. Do you know about this?
Also, could you explain a bit more detailed as to what sort of coverage can we get and the command you normally use (on 1 or multiple bam files)? I would assume: 1) % of bases covered with 10x, 20x... etc... (helps to visualize the quality of your data I guess?) 2) coverage of gene / desired location?
If 2) is possible, then I would also like to know how I can get coverage for a desired location.
And how different is this from mpileup?
Comment
-
Yes, GATK is pretty Finicky about the RG tag and karyotypic order of chromosomes (GATK doesn't like chr1, chr10, chr11, it likes chr1, chr2, chr3, etc).
You could use Picards AddOrReplaceReadGroup to add your missing RG tag.
I run DepthOfCoverage twice:
Once with CCDS to check exon coverage
A second time for the genome coverage
For Exomes, I just do it the one time.
Commands are like this
Code:#Exome java -Xmx4G -jar GenomeAnalysisTK.jar -T DepthOfCoverage -R human_hg19.fasta -I my.sorted.dup.bam --omitDepthOutputAtEachBase --logging_level ERROR -geneList refGene.sorted.txt --summaryCoverageThreshold 10 --summaryCoverageThreshold 20 --summaryCoverageThreshold 30 --summaryCoverageThreshold 40 --summaryCoverageThreshold 50 --summaryCoverageThreshold 80 --summaryCoverageThreshold 90 --summaryCoverageThreshold 100 --summaryCoverageThreshold 150 --minBaseQuality 20 --minMappingQuality 30 --start 1 --stop 500 --nBins 499 -dt NONE -L CCDS.ccdsGenes.bed -o my.sorted.dup.CCDS.coverage #Genome java -Xmx4G -jar GenomeAnalysisTK.jar -T DepthOfCoverage -R human_hg19.fasta -I my.sorted.dup.bam --omitDepthOutputAtEachBase --logging_level ERROR --summaryCoverageThreshold 10 --summaryCoverageThreshold 20 --summaryCoverageThreshold 30 --summaryCoverageThreshold 40 --summaryCoverageThreshold 50 --summaryCoverageThreshold 80 --summaryCoverageThreshold 90 --summaryCoverageThreshold 100 --summaryCoverageThreshold 150 --minBaseQuality 20 --minMappingQuality 30 --start 1 --stop 1000 --nBins 999 -dt NONE -o my.sorted.dup.coverage
Last edited by lletourn; 10-05-2011, 06:25 PM.
Comment
-
Hi lletourn,
I ran for genome coverage. Looking at the output I realized it doesn't make much sense because I am working on transcriptome data from RNA-Seq. I should just look for exons with the -L option and exon coordinates as .bed file, I suppose. If I am doing something wrong, please let me know. I'll get back to you once I get some results with that. I am also crosschecking with samtools mpileup.
Thanks again.
Comment
Latest Articles
Collapse
-
by seqadmin
Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...-
Channel: Articles
12-16-2024, 07:57 AM -
-
by seqadmin
Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.
Long-Read Sequencing
Long-read sequencing has seen remarkable advancements,...-
Channel: Articles
12-02-2024, 01:49 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 12-17-2024, 10:28 AM
|
0 responses
33 views
0 likes
|
Last Post
by seqadmin
12-17-2024, 10:28 AM
|
||
Started by seqadmin, 12-13-2024, 08:24 AM
|
0 responses
49 views
0 likes
|
Last Post
by seqadmin
12-13-2024, 08:24 AM
|
||
Started by seqadmin, 12-12-2024, 07:41 AM
|
0 responses
34 views
0 likes
|
Last Post
by seqadmin
12-12-2024, 07:41 AM
|
||
Started by seqadmin, 12-11-2024, 07:45 AM
|
0 responses
46 views
0 likes
|
Last Post
by seqadmin
12-11-2024, 07:45 AM
|
Comment