Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculation of genome sequencing coverage

    Dear all,
    I am new to NGS and hence i require some help with the analysis of genome sequencing using Ion PGM chip 318.

    I have performed a Whole genome shotgun sequencing using Ion PGM on 318 chip. we have generated 331,012,462 bases, 1,307,232 reads and the mean read length is about 253 bp. the genome size is 4,019,665 bp.

    I need help to calculate the genome coverage (how much X?)

    thanks a lot in advance....

    Dr.Pradeep

  • #2
    Your X coverage is (331,012,462 bases)/(4,019,665 bp), or 82X average.

    Comment


    • #3
      Dear Brian, Thanks a lot. it has cleared a lot of doubts.

      Comment


      • #4
        We have done WGS using illumina paired-end sequencing (Hi seq). After pre-processing, we got 42300018262 bases, mean read length is 101 bp
        and genome size is 2032558240 bp. Then I get coverage 42300018262/2032558240 is 20.8 X (average).

        But when I read map my illumina paired-end reads to de novo assembled genome through alignment and through qualimap assessment
        I get only 5X coverage (mean).

        Why this difference?. I think before doing genome assembly what we get coverage is physical coverage 20.8X but after de novo assembling
        and through read alignment we get actual coverage 5X. Is my assumption is right?

        Comment


        • #5
          Well... there are different ways to calculate coverage. For one thing, what is the ploidy of your genome? Also, what fraction of the reads mapped? And what was the insert size distribution? If your reads were mostly overlapping, then the coverage by unique molecules would be reduced... and if they were mostly adapter sequence, it would be reduced even more... so the meaning of "5X coverage by mapping" depends on precisely how the number was calculated.

          Comment


          • #6
            Thanks Brian. Our plant is tetraploid genome.

            Mapped sequence evaluated by qualimap (http://qualimap.bioinfo.cipf.es/)
            Reference size (bp)- 1491942955
            No.of mapped reads(bp) 102,654,077- (99.73%)
            Mean mapping quality- 37.18
            Insert size- 241
            No.of A's - 31.76%
            No.of C's - 19.71%
            Bo.of T's- 30.04%
            No.of G's =18.48%

            Comment


            • #7
              Those numbers seem to contradict each other - in the first post you state the genome size is 2032558240, and in the second, 1491942955. Or are they different references? Also, 102,654,077 mapped reads at 101bp is only ~10Gbp, not 42300018262 as in your first post. So I'm a little confused.

              That said - if the genome is ~2Gbp and tetraploid, then 42Gbp would give you ~5x coverage per ploidy...

              Comment


              • #8
                Sorry for making confusion.

                In my first post, the genome size is estimated by k-mer method (kmergenie), so the estimated genome size is around 2032558240 bp. The filtered reads assessed by fastqc is 42300018262 bases. By this method it comes around 42300018262/2032558240 is 20.8 X (average).

                In my second post, the reference genome is by de novo assembled and the denovo assembled is less than estimated genome size, the genome assembly comes around 1491942955 bp. This used as reference genome and mapped filtered reads (42300018262 bases) to the reference genome. But some how I could only able to align the filtered reads with BWA (102,654,077 bp instead of 42300018262 bp) to de novo assembled reference genome. By mapped method the coverage calculation comes around 5X (mean). Why is this difference?.

                Comment


                • #9
                  The numbers are still strange - 102,654,077 bp of mapped reads on a 1491942955 bp assembly is only 0.069X.

                  Comment


                  • #10
                    I should give no.of mapped bases instead of reads..my mapped reads is 9,055,285,852 bp on a on a 1491942955 bp assembly gives ~6X. But my question before assembling I get coverage I get around 20X and after assembling I get 6X. So is this means sequenced at 20X and get actual assembly coverage is 6x. What does phyiscal coverage and actual coverage mean?

                    Comment


                    • #11
                      It sounds like something is wrong with the mapping, or you posted an incorrect number somewhere. If 99%+ of the reads mapped, then you should be getting 421858082 bp coverage, not 9055285852. So either the mapping rate is incorrect or the mapping program was not fed all of the initial reads.

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM
                      • seqadmin
                        Strategies for Sequencing Challenging Samples
                        by seqadmin


                        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                        03-22-2024, 06:39 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      25 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      27 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 09:21 AM
                      0 responses
                      24 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-04-2024, 09:00 AM
                      0 responses
                      52 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X