Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Length of genome covered by reads by mapping

    Hello,

    I have generated SAM and BAM files after mapping my Illumina reads to a reference genome. Now I want to know how much of the reference genome is covered (aligned/mapped) by reads (e.g. 50% of the reference genome is covered by reads). I have seen many ways to get the depth of reads but haven't found a way to get the coverage of genome length (breadth or width). Could anyone suggest an advice on this? Thanks.
    Last edited by morning latte; 01-29-2015, 05:46 PM.

  • #2
    Originally posted by morning latte View Post
    Hello,

    I have seen many ways to get the depth of reads but haven't found a way to get the coverage of genome length (breadth or width). Could anyone suggest an advice on this? Thanks.
    Hi,
    If you have a CLC Genomics Workbench around, you can generate what they call a "detailed mapping report" of your reads-Reference genome. It will show the fraction of genome covered by your reads.
    Hope it helps
    Cheers

    Comment


    • #3
      Thanks Sergioo. Unfortunately, I don't have a CLC Genomics Workbench around me. Could you direct me an alternative way if you have any idea? Thanks!

      Comment


      • #4
        Qualimap generates detailed stats for BAM files.

        Comment


        • #5
          And there's also...

          The BBMap suite's pileup program! It takes sam or bam, sorted or unsorted.

          pileup.sh in=mapped.sam out=stats.txt hist=histogram.txt

          stats.txt will contain the average depth and percent covered of each reference sequence; the histogram will contain the exact number of bases with a each coverage level. You can also get per-base coverage or binned coverage if you want to plot the coverage. It also generates median and standard deviation, and so forth.

          It's also possible to generate coverage directly from BBMap, without an intermediate sam file, like this:

          bbmap.sh in=reads.fq ref=reference.fasta nodisk covstats=stats.txt covhist=histogram.txt

          We use this a lot in situations where all you care about is coverage distributions, which is somewhat common in metagenome assemblies. It also supports most of the flags that pileup.sh supports, though the syntax is slightly different to prevent collisions. In each case you can see all the possible flags by running the shellscript with no arguments.

          P.S. I put some work into it last week and it is now over 3x as fast as it used to be, and it used to be pretty fast!
          Last edited by Brian Bushnell; 01-29-2015, 06:52 PM.

          Comment


          • #6
            Dear Brian Bushnell,

            Thanks a lot for the suggestion. I just ran BBMap on one of my sam files and the summary output looks like below.

            Average coverage: 9.75
            Percent scaffolds with any coverage: 100.00
            Percent of reference bases covered: 0.32

            I guess only 0.32 proportion of the reference genome was covered by reads at any coverage. Then what does " Percent scaffolds with any coverage" mean? Thanks for your help in advance.

            Comment


            • #7
              Wow - you had very uneven coverage.

              "Percent of scaffolds with any coverage" means that - well... let's assume you had a human reference genome, which has 25 chromosomes: 1-22, X, Y and M.

              In that case, if each of those 25 sequences had at least one read hit, then the percentage of scaffolds with coverage would be 100%. You can get more details in the per-scaffold coverage file to see what percent of each scaffold was covered... in general, for a complete genome, "scaffold" means "chromosome".

              0.32% refers to the percent of bases across the entire genome that had any coverage, and you can consult the histogram for more details. But essentially, (100% - 0.32%) of the genome had zero coverage. I assume this was a ChipSeq experiment or similar where the assumption is that 99.9% of the coverage falls upon 0.1% of the genome.

              Comment


              • #8
                Dear Brian Bushnell,

                Thanks a lot for the detailed explanation on this. Everything is now very clear.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM
                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 03-27-2024, 06:37 PM
                0 responses
                12 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-27-2024, 06:07 PM
                0 responses
                11 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-22-2024, 10:03 AM
                0 responses
                53 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-21-2024, 07:32 AM
                0 responses
                68 views
                0 likes
                Last Post seqadmin  
                Working...
                X