SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Ion Torrent



Similar Threads
Thread Thread Starter Forum Replies Last Post
Coverage Calculation for Whole Genome Sequencing on GA II X ron128 Bioinformatics 3 01-09-2013 11:02 PM
Coverage calculation w/genome ccard28 Bioinformatics 1 09-27-2012 01:00 PM
coverage calculation arvi8689 Illumina/Solexa 7 11-11-2011 02:53 PM
coverage calculation arvi8689 Bioinformatics 2 11-07-2011 11:44 PM
coverage calculation arvi8689 Genomic Resequencing 1 11-07-2011 02:01 PM

Reply
 
Thread Tools
Old 11-02-2014, 07:07 PM   #1
pradeepbe
Junior Member
 
Location: Prasanthi Nilayam, India

Join Date: Nov 2014
Posts: 2
Default Calculation of genome sequencing coverage

Dear all,
I am new to NGS and hence i require some help with the analysis of genome sequencing using Ion PGM chip 318.

I have performed a Whole genome shotgun sequencing using Ion PGM on 318 chip. we have generated 331,012,462 bases, 1,307,232 reads and the mean read length is about 253 bp. the genome size is 4,019,665 bp.

I need help to calculate the genome coverage (how much X?)

thanks a lot in advance....

Dr.Pradeep
pradeepbe is offline   Reply With Quote
Old 11-02-2014, 07:51 PM   #2
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Your X coverage is (331,012,462 bases)/(4,019,665 bp), or 82X average.
Brian Bushnell is offline   Reply With Quote
Old 11-02-2014, 11:53 PM   #3
pradeepbe
Junior Member
 
Location: Prasanthi Nilayam, India

Join Date: Nov 2014
Posts: 2
Default

Dear Brian, Thanks a lot. it has cleared a lot of doubts.
pradeepbe is offline   Reply With Quote
Old 11-03-2014, 01:00 AM   #4
bioman1
Member
 
Location: US

Join Date: May 2012
Posts: 80
Default

We have done WGS using illumina paired-end sequencing (Hi seq). After pre-processing, we got 42300018262 bases, mean read length is 101 bp
and genome size is 2032558240 bp. Then I get coverage 42300018262/2032558240 is 20.8 X (average).

But when I read map my illumina paired-end reads to de novo assembled genome through alignment and through qualimap assessment
I get only 5X coverage (mean).

Why this difference?. I think before doing genome assembly what we get coverage is physical coverage 20.8X but after de novo assembling
and through read alignment we get actual coverage 5X. Is my assumption is right?
bioman1 is offline   Reply With Quote
Old 11-03-2014, 08:28 AM   #5
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Well... there are different ways to calculate coverage. For one thing, what is the ploidy of your genome? Also, what fraction of the reads mapped? And what was the insert size distribution? If your reads were mostly overlapping, then the coverage by unique molecules would be reduced... and if they were mostly adapter sequence, it would be reduced even more... so the meaning of "5X coverage by mapping" depends on precisely how the number was calculated.
Brian Bushnell is offline   Reply With Quote
Old 11-03-2014, 12:06 PM   #6
bioman1
Member
 
Location: US

Join Date: May 2012
Posts: 80
Default

Thanks Brian. Our plant is tetraploid genome.

Mapped sequence evaluated by qualimap (http://qualimap.bioinfo.cipf.es/)
Reference size (bp)- 1491942955
No.of mapped reads(bp) 102,654,077- (99.73%)
Mean mapping quality- 37.18
Insert size- 241
No.of A's - 31.76%
No.of C's - 19.71%
Bo.of T's- 30.04%
No.of G's =18.48%
bioman1 is offline   Reply With Quote
Old 11-03-2014, 12:16 PM   #7
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Those numbers seem to contradict each other - in the first post you state the genome size is 2032558240, and in the second, 1491942955. Or are they different references? Also, 102,654,077 mapped reads at 101bp is only ~10Gbp, not 42300018262 as in your first post. So I'm a little confused.

That said - if the genome is ~2Gbp and tetraploid, then 42Gbp would give you ~5x coverage per ploidy...
Brian Bushnell is offline   Reply With Quote
Old 11-03-2014, 01:44 PM   #8
bioman1
Member
 
Location: US

Join Date: May 2012
Posts: 80
Default

Sorry for making confusion.

In my first post, the genome size is estimated by k-mer method (kmergenie), so the estimated genome size is around 2032558240 bp. The filtered reads assessed by fastqc is 42300018262 bases. By this method it comes around 42300018262/2032558240 is 20.8 X (average).

In my second post, the reference genome is by de novo assembled and the denovo assembled is less than estimated genome size, the genome assembly comes around 1491942955 bp. This used as reference genome and mapped filtered reads (42300018262 bases) to the reference genome. But some how I could only able to align the filtered reads with BWA (102,654,077 bp instead of 42300018262 bp) to de novo assembled reference genome. By mapped method the coverage calculation comes around 5X (mean). Why is this difference?.
bioman1 is offline   Reply With Quote
Old 11-03-2014, 02:21 PM   #9
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

The numbers are still strange - 102,654,077 bp of mapped reads on a 1491942955 bp assembly is only 0.069X.
Brian Bushnell is offline   Reply With Quote
Old 11-04-2014, 01:48 AM   #10
bioman1
Member
 
Location: US

Join Date: May 2012
Posts: 80
Default

I should give no.of mapped bases instead of reads..my mapped reads is 9,055,285,852 bp on a on a 1491942955 bp assembly gives ~6X. But my question before assembling I get coverage I get around 20X and after assembling I get 6X. So is this means sequenced at 20X and get actual assembly coverage is 6x. What does phyiscal coverage and actual coverage mean?
bioman1 is offline   Reply With Quote
Old 11-05-2014, 03:58 PM   #11
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

It sounds like something is wrong with the mapping, or you posted an incorrect number somewhere. If 99%+ of the reads mapped, then you should be getting 421858082 bp coverage, not 9055285852. So either the mapping rate is incorrect or the mapping program was not fed all of the initial reads.
Brian Bushnell is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:30 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO