SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
what program can i use for calculating coverage? dkrtndhkd Bioinformatics 2 02-03-2012 05:39 PM
Gene list for calculating coverage nseh Bioinformatics 1 05-22-2011 06:31 AM
reads and coverage question seqgirl123 Illumina/Solexa 2 03-28-2011 08:06 AM
"nucleotide coverage" to genome feature coverage sheremey Bioinformatics 3 11-02-2010 11:24 AM
low 454 coverage combined with high solexa coverage strob Bioinformatics 7 10-07-2010 10:14 AM

Reply
 
Thread Tools
Old 02-22-2012, 07:45 AM   #1
tristanstoeber
Junior Member
 
Location: Germany

Join Date: Feb 2012
Posts: 2
Default Newbie Question: Calculating physical coverage from genome coverage

Hey guys,

I could not find a precise information of how to calculate the physical coverage from genome coverage, assuming that I have a paired end library, where both reads have an average length of 70bps and the insert size is around 200bp.

Does the definition of physical coverage include both reads?

I was guessing that it does and calculated it the following way:

Physical_Coverage=Genome_Coverage*(2*Read_Length+Insert_Size)/(2*Read_Length)

As an example:
If I know that my genome coverage is 4,
then the phyiscal coverage is 4*(2*70+200)/(2*70)=9.7

Is that correct?

Thanks in advance!
Tristan

Last edited by tristanstoeber; 02-22-2012 at 07:47 AM.
tristanstoeber is offline   Reply With Quote
Old 02-22-2012, 11:16 AM   #2
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 836
Default

What's the difference between physical coverage and genome coverage? Is that coverage including unsequenced insert bases vs coverage only for sequenced bases?

The coverage calculation I am used to is fairly simple:

Code:
total number of base pairs sequenced (S)
----------------------------------------
total number of base pairs in genome (G)
For a single end run of 100M reads, each with 100bp sequenced, that's 10Gbp sequenced bases, which would be a coverage of 5x for a 2Gbp genome.

Code:
both reads have an average length of 70bps and the insert size is around 200bp.
insert size is a confusing term. I much prefer fragment length, because the size selection is most commonly for fragments of a particular size.

For paired-end runs, you could either consider S to include the non-sequenced bases present in the fragment (Sf = fragment length * number of pairs), or only include the sequenced bases (Ss = read length * number of reads, or Ss = 2 * read length * number of pairs).

The ratio of these two S sizes is the ratio of total fragment length to sequenced bases, which will be the same as the ratio for any coverage value calculated based on these sizes:
Code:
    Sf/Ss = (fragment length) / (read length * 2) 
<=> Sf = (fragment length) * Ss / (read length * 2)
<=> Ss = (read length * 2) * Sf / (fragment length)

Last edited by gringer; 02-22-2012 at 11:30 AM. Reason: correction -- reads / pairs, relationship equation
gringer is offline   Reply With Quote
Old 06-24-2013, 09:43 AM   #3
pari_89
Member
 
Location: Newcastle upon Tyne

Join Date: Apr 2013
Posts: 55
Default

Can anyone please explain to me the difference between physical and genome coverage please?

Thank you

Kind Regards

Parinita
pari_89 is offline   Reply With Quote
Old 06-24-2013, 10:43 AM   #4
SNPsaurus
Registered Vendor
 
Location: Eugene, OR

Join Date: May 2013
Posts: 501
Default

For paired end reads, a typical Illumina library would consist of two 100-bp reads from a 500 bp genomic fragment. So if you get 10X coverage of the genome with the sequenced reads, you will have a higher coverage of the genomic fragments used to generate those reads. See here:

Code:
          
                  ssssss---------------ssssss
ssssss-------------sssssss        ssssss---------ssssss
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
This example has less than 1X sequence coverage "s" of the genome "G". But there is >1X coverage by the genomic fragments in the library. If you are mapping inversions, for example, the physical coverage of the fragments is more important than the sequencing coverage. If you are using mate pairs (two 100 bp reads from 5-10kb fragments), the physical coverage will be much higher than the sequencing coverage.
__________________
Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com
SNPsaurus is offline   Reply With Quote
Old 06-24-2013, 10:53 AM   #5
pari_89
Member
 
Location: Newcastle upon Tyne

Join Date: Apr 2013
Posts: 55
Default

Thank you. Now I understand the difference.

Kind Regards

Parinita.
pari_89 is offline   Reply With Quote
Reply

Tags
genome coverage, paired end reads, physical coverage

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:33 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO