Seqanswers Leaderboard Ad

**swbarnes2** · 09-22-2011, 08:21 AM

I'd use BEDTools to get coverage stats. The normal Illumina pipeline will work out error stats, so if you have Illumina data, ask about that sumamry.

**liu_xt005** · 09-22-2011, 08:34 AM

Dear swbarnes2, thank you very much. I will try BEDTools.

**Heisman** · 09-22-2011, 09:06 AM

You should also look into using GATK DepthOfCoverage.

**lletourn** · 09-27-2011, 07:34 AM

As with Heisman I prefer DepthOfCoverage especially for exomes because of the percent reads at X depth feature.

For you interval list it can tell you which percentage of your bases are higher than 5x, 10x, 30x,50x, etc (configurable). This helps greatly to see if you have sufficient overall base coverage >20x to make confident calls.

**liu_xt005** · 10-04-2011, 12:48 PM

Thanks to Heisman and lletourn, DepthOfCoverage is great.

**cedance** · 10-04-2011, 12:51 PM

Isn't depth of coverage just the ratio of total number of reads sequenced to the number of bases in your genome?

Cov = Total reads sequenced/ no: of bases in genome (in fasta)?

Or is that X fold coverage? I'm confused now!

**lletourn** · 10-04-2011, 03:03 PM

Originally posted by cedance View Post

Isn't depth of coverage just the ratio of total number of reads sequenced to the number of bases in your genome?

Cov = Total reads sequenced/ no: of bases in genome (in fasta)?

Yes but it's a gross approximation. You lose the reads that are trimmed/clipped. the ones that don't align, the ones not on target the bases with too low quality, the reads that have a too-low mapping quality...you get the picture.

When a tool compute average or median of an interval_list it gives you the "true", usable coverage.

**Fabien Campagne** · 10-05-2011, 05:25 AM

When the alignment is written in Goby format (which you can produce with this version of BWA, or with GobyWeb), you can estimate coverage statistics for exome as described in this tutorial:
estimating-coverage-statistics-for-targeted-resequencing-experiments

**cedance** · 10-05-2011, 03:04 PM

Originally posted by lletourn View Post

Yes but it's a gross approximation. You lose the reads that are trimmed/clipped. the ones that don't align, the ones not on target the bases with too low quality, the reads that have a too-low mapping quality...you get the picture.

When a tool compute average or median of an interval_list it gives you the "true", usable coverage.

hi lletourn, Thanks for your reply. got the picture. (sorry for too many questions, I have just thought about looking at the quality of my "mapped" data. I had overlooked it until now.)

I tried DepthOfCoverage on my data but I got an error related to "RG" (read group). I work on alternative splicing and I merged the data set (by modifying the header so that I can get back the files). I guess I should just add a tag with RG as optional field. Do you know about this?

Also, could you explain a bit more detailed as to what sort of coverage can we get and the command you normally use (on 1 or multiple bam files)? I would assume: 1) % of bases covered with 10x, 20x... etc... (helps to visualize the quality of your data I guess?) 2) coverage of gene / desired location?
If 2) is possible, then I would also like to know how I can get coverage for a desired location.

And how different is this from mpileup?

**lletourn** · 10-05-2011, 06:20 PM

Yes, GATK is pretty Finicky about the RG tag and karyotypic order of chromosomes (GATK doesn't like chr1, chr10, chr11, it likes chr1, chr2, chr3, etc).

You could use Picards AddOrReplaceReadGroup to add your missing RG tag.

404 Not Found

http://picard.sourceforge.net/command-line-overview.shtml#AddOrReplaceReadGroups

I run DepthOfCoverage twice:
Once with CCDS to check exon coverage
A second time for the genome coverage

For Exomes, I just do it the one time.

Commands are like this

Code:

#Exome
java -Xmx4G -jar GenomeAnalysisTK.jar -T DepthOfCoverage -R human_hg19.fasta -I my.sorted.dup.bam --omitDepthOutputAtEachBase --logging_level ERROR -geneList refGene.sorted.txt --summaryCoverageThreshold 10 --summaryCoverageThreshold 20 --summaryCoverageThreshold 30 --summaryCoverageThreshold 40 --summaryCoverageThreshold 50 --summaryCoverageThreshold 80
--summaryCoverageThreshold 90 --summaryCoverageThreshold 100
--summaryCoverageThreshold 150 --minBaseQuality 20 --minMappingQuality 30 --start 1 --stop 500 --nBins 499 -dt NONE -L CCDS.ccdsGenes.bed -o my.sorted.dup.CCDS.coverage

#Genome
java -Xmx4G -jar GenomeAnalysisTK.jar -T DepthOfCoverage -R human_hg19.fasta -I my.sorted.dup.bam --omitDepthOutputAtEachBase --logging_level ERROR --summaryCoverageThreshold 10 --summaryCoverageThreshold 20 --summaryCoverageThreshold 30 --summaryCoverageThreshold 40 --summaryCoverageThreshold 50 --summaryCoverageThreshold 80 --summaryCoverageThreshold 90 --summaryCoverageThreshold 100 --summaryCoverageThreshold 150 --minBaseQuality 20 --minMappingQuality 30 --start 1 --stop 1000 --nBins 999 -dt NONE -o my.sorted.dup.coverage

The start, stop and nBins are just there as examples if your coverage is > 500x

**cedance** · 10-06-2011, 06:05 AM

Hi lletourn,
I ran for genome coverage. Looking at the output I realized it doesn't make much sense because I am working on transcriptome data from RNA-Seq. I should just look for exons with the -L option and exon coordinates as .bed file, I suppose. If I am doing something wrong, please let me know. I'll get back to you once I get some results with that. I am also crosschecking with samtools mpileup.

Thanks again.

**lletourn** · 10-06-2011, 08:14 AM

You're right, look only at the exomes+UTR.

Or give it any interval and also give it a refGene formatted file (the geneList option only work if you also pass an interval. I think that's a bug).

Topics	Statistics	Last Post
TIGR Systems Offer a Compact Alternative to CRISPR for Gene Editing by seqadmin Started by seqadmin, 03-03-2025, 01:15 PM	0 responses 162 views 0 likes	Last Post by seqadmin 03-03-2025, 01:15 PM
Highlights from AGBT 2025 – Part II by seqadmin Started by seqadmin, 02-28-2025, 12:58 PM	0 responses 251 views 0 likes	Last Post by seqadmin 02-28-2025, 12:58 PM
Highlights from AGBT 2025 – Part I by seqadmin Started by seqadmin, 02-24-2025, 02:48 PM	0 responses 625 views 0 likes	Last Post by seqadmin 02-24-2025, 02:48 PM
Selecting the Right AI Model for Bioinformatics Research by seqadmin Started by seqadmin, 02-21-2025, 02:46 PM	0 responses 265 views 0 likes	Last Post by seqadmin 02-21-2025, 02:46 PM

Seqanswers Leaderboard Ad

Announcement

How to obtain coverage % of the genome?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News