Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Newbie Question: Calculating physical coverage from genome coverage

    Hey guys,

    I could not find a precise information of how to calculate the physical coverage from genome coverage, assuming that I have a paired end library, where both reads have an average length of 70bps and the insert size is around 200bp.

    Does the definition of physical coverage include both reads?

    I was guessing that it does and calculated it the following way:

    Physical_Coverage=Genome_Coverage*(2*Read_Length+Insert_Size)/(2*Read_Length)

    As an example:
    If I know that my genome coverage is 4,
    then the phyiscal coverage is 4*(2*70+200)/(2*70)=9.7

    Is that correct?

    Thanks in advance!
    Tristan
    Last edited by tristanstoeber; 02-22-2012, 08:47 AM.

  • #2
    What's the difference between physical coverage and genome coverage? Is that coverage including unsequenced insert bases vs coverage only for sequenced bases?

    The coverage calculation I am used to is fairly simple:

    Code:
    total number of base pairs sequenced (S)
    ----------------------------------------
    total number of base pairs in genome (G)
    For a single end run of 100M reads, each with 100bp sequenced, that's 10Gbp sequenced bases, which would be a coverage of 5x for a 2Gbp genome.

    Code:
    both reads have an average length of 70bps and the insert size is around 200bp.
    insert size is a confusing term. I much prefer fragment length, because the size selection is most commonly for fragments of a particular size.

    For paired-end runs, you could either consider S to include the non-sequenced bases present in the fragment (Sf = fragment length * number of pairs), or only include the sequenced bases (Ss = read length * number of reads, or Ss = 2 * read length * number of pairs).

    The ratio of these two S sizes is the ratio of total fragment length to sequenced bases, which will be the same as the ratio for any coverage value calculated based on these sizes:
    Code:
        Sf/Ss = (fragment length) / (read length * 2) 
    <=> Sf = (fragment length) * Ss / (read length * 2)
    <=> Ss = (read length * 2) * Sf / (fragment length)
    Last edited by gringer; 02-22-2012, 12:30 PM. Reason: correction -- reads / pairs, relationship equation

    Comment


    • #3
      Can anyone please explain to me the difference between physical and genome coverage please?

      Thank you

      Kind Regards

      Parinita

      Comment


      • #4
        For paired end reads, a typical Illumina library would consist of two 100-bp reads from a 500 bp genomic fragment. So if you get 10X coverage of the genome with the sequenced reads, you will have a higher coverage of the genomic fragments used to generate those reads. See here:

        Code:
        [FONT="Courier New"]          
                          ssssss---------------ssssss
        ssssss-------------sssssss        ssssss---------ssssss
        GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
        [/FONT]
        This example has less than 1X sequence coverage "s" of the genome "G". But there is >1X coverage by the genomic fragments in the library. If you are mapping inversions, for example, the physical coverage of the fragments is more important than the sequencing coverage. If you are using mate pairs (two 100 bp reads from 5-10kb fragments), the physical coverage will be much higher than the sequencing coverage.
        Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

        Comment


        • #5
          Thank you. Now I understand the difference.

          Kind Regards

          Parinita.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          30 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          32 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          28 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          52 views
          0 likes
          Last Post seqadmin  
          Working...
          X