 11-02-2014, 07:07 PM #1 pradeepbe Junior Member   Location: Prasanthi Nilayam, India Join Date: Nov 2014 Posts: 2 Calculation of genome sequencing coverage Dear all, I am new to NGS and hence i require some help with the analysis of genome sequencing using Ion PGM chip 318. I have performed a Whole genome shotgun sequencing using Ion PGM on 318 chip. we have generated 331,012,462 bases, 1,307,232 reads and the mean read length is about 253 bp. the genome size is 4,019,665 bp. I need help to calculate the genome coverage (how much X?) thanks a lot in advance.... Dr.Pradeep
 11-02-2014, 07:51 PM #2 Brian Bushnell Super Moderator   Location: Walnut Creek, CA Join Date: Jan 2014 Posts: 2,707 Your X coverage is (331,012,462 bases)/(4,019,665 bp), or 82X average.
 11-02-2014, 11:53 PM #3 pradeepbe Junior Member   Location: Prasanthi Nilayam, India Join Date: Nov 2014 Posts: 2 Dear Brian, Thanks a lot. it has cleared a lot of doubts.
 11-03-2014, 01:00 AM #4 bioman1 Member   Location: US Join Date: May 2012 Posts: 80 We have done WGS using illumina paired-end sequencing (Hi seq). After pre-processing, we got 42300018262 bases, mean read length is 101 bp and genome size is 2032558240 bp. Then I get coverage 42300018262/2032558240 is 20.8 X (average). But when I read map my illumina paired-end reads to de novo assembled genome through alignment and through qualimap assessment I get only 5X coverage (mean). Why this difference?. I think before doing genome assembly what we get coverage is physical coverage 20.8X but after de novo assembling and through read alignment we get actual coverage 5X. Is my assumption is right?
 11-03-2014, 08:28 AM #5 Brian Bushnell Super Moderator   Location: Walnut Creek, CA Join Date: Jan 2014 Posts: 2,707 Well... there are different ways to calculate coverage. For one thing, what is the ploidy of your genome? Also, what fraction of the reads mapped? And what was the insert size distribution? If your reads were mostly overlapping, then the coverage by unique molecules would be reduced... and if they were mostly adapter sequence, it would be reduced even more... so the meaning of "5X coverage by mapping" depends on precisely how the number was calculated.
 11-03-2014, 12:06 PM #6 bioman1 Member   Location: US Join Date: May 2012 Posts: 80 Thanks Brian. Our plant is tetraploid genome. Mapped sequence evaluated by qualimap (http://qualimap.bioinfo.cipf.es/) Reference size (bp)- 1491942955 No.of mapped reads(bp) 102,654,077- (99.73%) Mean mapping quality- 37.18 Insert size- 241 No.of A's - 31.76% No.of C's - 19.71% Bo.of T's- 30.04% No.of G's =18.48%
 11-03-2014, 12:16 PM #7 Brian Bushnell Super Moderator   Location: Walnut Creek, CA Join Date: Jan 2014 Posts: 2,707 Those numbers seem to contradict each other - in the first post you state the genome size is 2032558240, and in the second, 1491942955. Or are they different references? Also, 102,654,077 mapped reads at 101bp is only ~10Gbp, not 42300018262 as in your first post. So I'm a little confused. That said - if the genome is ~2Gbp and tetraploid, then 42Gbp would give you ~5x coverage per ploidy...
 11-03-2014, 01:44 PM #8 bioman1 Member   Location: US Join Date: May 2012 Posts: 80 Sorry for making confusion. In my first post, the genome size is estimated by k-mer method (kmergenie), so the estimated genome size is around 2032558240 bp. The filtered reads assessed by fastqc is 42300018262 bases. By this method it comes around 42300018262/2032558240 is 20.8 X (average). In my second post, the reference genome is by de novo assembled and the denovo assembled is less than estimated genome size, the genome assembly comes around 1491942955 bp. This used as reference genome and mapped filtered reads (42300018262 bases) to the reference genome. But some how I could only able to align the filtered reads with BWA (102,654,077 bp instead of 42300018262 bp) to de novo assembled reference genome. By mapped method the coverage calculation comes around 5X (mean). Why is this difference?.
 11-03-2014, 02:21 PM #9 Brian Bushnell Super Moderator   Location: Walnut Creek, CA Join Date: Jan 2014 Posts: 2,707 The numbers are still strange - 102,654,077 bp of mapped reads on a 1491942955 bp assembly is only 0.069X.
 11-04-2014, 01:48 AM #10 bioman1 Member   Location: US Join Date: May 2012 Posts: 80 I should give no.of mapped bases instead of reads..my mapped reads is 9,055,285,852 bp on a on a 1491942955 bp assembly gives ~6X. But my question before assembling I get coverage I get around 20X and after assembling I get 6X. So is this means sequenced at 20X and get actual assembly coverage is 6x. What does phyiscal coverage and actual coverage mean?
 11-05-2014, 03:58 PM #11 Brian Bushnell Super Moderator   Location: Walnut Creek, CA Join Date: Jan 2014 Posts: 2,707 It sounds like something is wrong with the mapping, or you posted an incorrect number somewhere. If 99%+ of the reads mapped, then you should be getting 421858082 bp coverage, not 9055285852. So either the mapping rate is incorrect or the mapping program was not fed all of the initial reads.