Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • gringer
    David Eccles (gringer)
    • May 2011
    • 845

    #46
    If duplications / deletions are rare enough, then median coverage should be fine. A median statistic will typically deal with the spikes and troughs that are an issue for using mean as a descriptive statistic.

    Comment

    • swbarnes2
      Senior Member
      • May 2008
      • 910

      #47
      Originally posted by mrood View Post
      It seems to me that if you are mapping to a reference genome and there are regions that have more than twice the average coverage that it is probably the result of a duplication or something in the genome of the sequenced organism.
      Natural variation in sequencing coverage could easily make 2 fold differences in coverage, or more.

      Likewise, if it has very poor coverage the organism likely does not have that region in its genome and it is likely the result of improper mapping.
      Or, the region is there, but so divergent from your reference that reads are mapping poorly, or the region could be GC rich or something, causing few reads to be generated there.

      Comment

      • recombinationhotspot
        Junior Member
        • Jun 2013
        • 2

        #48
        I am trying to calculate the average coverage for a given region , e.g 200 bps where my reads are aligned. Is there any software that can do that without actually having to write any commands. Please note that I have no bioinformatics background and don't have access to a linux, etc operating system. The best solution I have until now is to use Savant genome browser and convert the .bam files into .bam.cov.tdf files which shows me the maximum coverage.

        Comment

        • gringer
          David Eccles (gringer)
          • May 2011
          • 845

          #49
          Is there any software that can do that without actually having to write any commands
          Er, you want a program to run that means you don't have to run a program? That's a difficult request.

          I suppose you could try using Galaxy, which hides all that pesky "running commands" stuff from you. That has a feature coverage tool, but requires input files to be in BED format, and presumably there are other tools closer to what you desire. From this email:

          To calculate coverage, please see the tool "Regional Variation ->
          Feature coverage". Query and target must both be in Interval/BED format.
          Query data in Interval/BED format is possible in most of the dataflow
          paths through the tools and from external sources. The reference genome
          file will likely need to be imported and formatted.

          Comment

          • rathankar
            Member
            • Oct 2011
            • 10

            #50
            calculating coverage depth

            Originally posted by westerman View Post
            From my understanding yes they are different and what you are calculating is the 'X' coverage. I.e., given the number of raw bases sequenced how many times (or X) does the sequencing potentially cover the genome.

            % coverage is how well the genome is actually covered after all mapping and assembly is done.

            As an example let's say we have 300M reads of 50 bases or 1.5 Gbase total. Our genome is 150M bases. After mapping (or assembly) we have a bunch of non-overlapping contigs that have 100M bases total.

            So our 'X coverage' is 10X (1.5 Gbases / 150 Mbases)
            Our '% coverage' is 66.6% (100 Mbases / 150 Mbases)


            One way to think about this is that percentages generally range from 0% to 100% and so having a percentage greater that 100 can be confusing.


            I use the haploid genome size or more specifically the C-value times 965Mbases/pg.
            Hi

            I went thru this post and understood how do we express coverage depth. But I need a small clarification. Does this coverage depth involve mutations in the reads [i mean non matching positions with respect to reference sequence], since it only takes the number of bases in the sample and the no. of bases in the reference sequence.

            2. if a read matches at more than one location, then will the coverage depth not increase. is there a way to reduce that error?
            Sr. Application Scientist, Apsara Innovations, Bangalore
            E-Mail: [email protected]

            Comment

            • westerman
              Rick Westerman
              • Jun 2008
              • 1104

              #51
              1. The SNP/Indels are usually not a big part of the genome; I doubt it they would throw off the calculations by a percent.

              2. Count the read only once; i.e., choose the best match or if multiple best matches then just choose one match by random.

              Really, unless you are working with a well characterized organism (e.g., human) then the numbers are going to be 'squishy' in any case. They are mainly there to give you an idea of how good your sequencing is. In other words if you calculate that you had 50x coverage (which is a nice de-novo assembly target) but only get 10% coverage to a closely related organism then that tells you something.

              Comment

              • criscruz
                Junior Member
                • Aug 2015
                • 7

                #52
                Hi everyone;

                I just have read your post but I still have my doubt in mind.

                I'm working with Ion PGM to generate whole genome sequences of some RNA viruses. Then I want to do a phylognetic tree with the consensus sequences of each virus I could identify. So the question is, how many coverage or reads per base I need in order to make a good consensus sequences to my phylogenetic analysis.
                I don't want to see or analyze the variant or quasespecies. So I need just the minimal necessary

                thanks for your time
                my best

                Cris

                Comment

                • westerman
                  Rick Westerman
                  • Jun 2008
                  • 1104

                  #53
                  50-100x coverage per genome is good. Much more than that and you will start getting misassemblies.

                  For virus I suggest using Mira. It is a good small genome assembler than can handle a lot of potential misassemblies.

                  Comment

                  • criscruz
                    Junior Member
                    • Aug 2015
                    • 7

                    #54
                    thanks westerman

                    Comment

                    Latest Articles

                    Collapse

                    • SEQadmin2
                      From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                      by SEQadmin2


                      Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                      The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                      ...
                      06-02-2026, 10:05 AM
                    • SEQadmin2
                      Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                      by SEQadmin2


                      With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                      Introduction

                      Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                      05-22-2026, 06:42 AM
                    • SEQadmin2
                      Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                      by SEQadmin2

                      Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                      Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                      05-06-2026, 09:04 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by SEQadmin2, 06-02-2026, 12:03 PM
                    0 responses
                    19 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-02-2026, 11:40 AM
                    0 responses
                    14 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 05-28-2026, 11:40 AM
                    0 responses
                    29 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 05-26-2026, 10:12 AM
                    0 responses
                    31 views
                    0 reactions
                    Last Post SEQadmin2  
                    Working...