For our paired-end exome sequence data, I have aligned the reads to HG18 reference genome with BWA, and called variants with SAMtools mpileup command. Could anyone teach me how to obtain the mapping coverage %, error rate, etc.? Thanks a lot.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
As with Heisman I prefer DepthOfCoverage especially for exomes because of the percent reads at X depth feature.
For you interval list it can tell you which percentage of your bases are higher than 5x, 10x, 30x,50x, etc (configurable). This helps greatly to see if you have sufficient overall base coverage >20x to make confident calls.
Comment
-
-
Originally posted by cedance View PostIsn't depth of coverage just the ratio of total number of reads sequenced to the number of bases in your genome?
Cov = Total reads sequenced/ no: of bases in genome (in fasta)?
When a tool compute average or median of an interval_list it gives you the "true", usable coverage.
Comment
-
When the alignment is written in Goby format (which you can produce with this version of BWA, or with GobyWeb), you can estimate coverage statistics for exome as described in this tutorial:
estimating-coverage-statistics-for-targeted-resequencing-experiments
Comment
-
Originally posted by lletourn View PostYes but it's a gross approximation. You lose the reads that are trimmed/clipped. the ones that don't align, the ones not on target the bases with too low quality, the reads that have a too-low mapping quality...you get the picture.
When a tool compute average or median of an interval_list it gives you the "true", usable coverage.
I tried DepthOfCoverage on my data but I got an error related to "RG" (read group). I work on alternative splicing and I merged the data set (by modifying the header so that I can get back the files). I guess I should just add a tag with RG as optional field. Do you know about this?
Also, could you explain a bit more detailed as to what sort of coverage can we get and the command you normally use (on 1 or multiple bam files)? I would assume: 1) % of bases covered with 10x, 20x... etc... (helps to visualize the quality of your data I guess?) 2) coverage of gene / desired location?
If 2) is possible, then I would also like to know how I can get coverage for a desired location.
And how different is this from mpileup?
Comment
-
Yes, GATK is pretty Finicky about the RG tag and karyotypic order of chromosomes (GATK doesn't like chr1, chr10, chr11, it likes chr1, chr2, chr3, etc).
You could use Picards AddOrReplaceReadGroup to add your missing RG tag.
I run DepthOfCoverage twice:
Once with CCDS to check exon coverage
A second time for the genome coverage
For Exomes, I just do it the one time.
Commands are like this
Code:#Exome java -Xmx4G -jar GenomeAnalysisTK.jar -T DepthOfCoverage -R human_hg19.fasta -I my.sorted.dup.bam --omitDepthOutputAtEachBase --logging_level ERROR -geneList refGene.sorted.txt --summaryCoverageThreshold 10 --summaryCoverageThreshold 20 --summaryCoverageThreshold 30 --summaryCoverageThreshold 40 --summaryCoverageThreshold 50 --summaryCoverageThreshold 80 --summaryCoverageThreshold 90 --summaryCoverageThreshold 100 --summaryCoverageThreshold 150 --minBaseQuality 20 --minMappingQuality 30 --start 1 --stop 500 --nBins 499 -dt NONE -L CCDS.ccdsGenes.bed -o my.sorted.dup.CCDS.coverage #Genome java -Xmx4G -jar GenomeAnalysisTK.jar -T DepthOfCoverage -R human_hg19.fasta -I my.sorted.dup.bam --omitDepthOutputAtEachBase --logging_level ERROR --summaryCoverageThreshold 10 --summaryCoverageThreshold 20 --summaryCoverageThreshold 30 --summaryCoverageThreshold 40 --summaryCoverageThreshold 50 --summaryCoverageThreshold 80 --summaryCoverageThreshold 90 --summaryCoverageThreshold 100 --summaryCoverageThreshold 150 --minBaseQuality 20 --minMappingQuality 30 --start 1 --stop 1000 --nBins 999 -dt NONE -o my.sorted.dup.coverage
Last edited by lletourn; 10-05-2011, 06:25 PM.
Comment
-
Hi lletourn,
I ran for genome coverage. Looking at the output I realized it doesn't make much sense because I am working on transcriptome data from RNA-Seq. I should just look for exons with the -L option and exon coordinates as .bed file, I suppose. If I am doing something wrong, please let me know. I'll get back to you once I get some results with that. I am also crosschecking with samtools mpileup.
Thanks again.
Comment
Latest Articles
Collapse
-
by seqadmin
The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...-
Channel: Articles
11-06-2024, 07:24 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Yesterday, 07:36 AM
|
0 responses
13 views
0 likes
|
Last Post
by seqadmin
Yesterday, 07:36 AM
|
||
Started by seqadmin, Yesterday, 07:04 AM
|
0 responses
11 views
0 likes
|
Last Post
by seqadmin
Yesterday, 07:04 AM
|
||
Started by seqadmin, 11-21-2024, 09:19 AM
|
0 responses
19 views
0 likes
|
Last Post
by seqadmin
11-21-2024, 09:19 AM
|
||
Started by seqadmin, 11-08-2024, 11:09 AM
|
0 responses
292 views
0 likes
|
Last Post
by seqadmin
11-08-2024, 11:09 AM
|
Comment