Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Length of genome covered by reads by mapping

    Hello,

    I have generated SAM and BAM files after mapping my Illumina reads to a reference genome. Now I want to know how much of the reference genome is covered (aligned/mapped) by reads (e.g. 50% of the reference genome is covered by reads). I have seen many ways to get the depth of reads but haven't found a way to get the coverage of genome length (breadth or width). Could anyone suggest an advice on this? Thanks.
    Last edited by morning latte; 01-29-2015, 05:46 PM.

  • #2
    Originally posted by morning latte View Post
    Hello,

    I have seen many ways to get the depth of reads but haven't found a way to get the coverage of genome length (breadth or width). Could anyone suggest an advice on this? Thanks.
    Hi,
    If you have a CLC Genomics Workbench around, you can generate what they call a "detailed mapping report" of your reads-Reference genome. It will show the fraction of genome covered by your reads.
    Hope it helps
    Cheers

    Comment


    • #3
      Thanks Sergioo. Unfortunately, I don't have a CLC Genomics Workbench around me. Could you direct me an alternative way if you have any idea? Thanks!

      Comment


      • #4
        Qualimap generates detailed stats for BAM files.

        Comment


        • #5
          And there's also...

          The BBMap suite's pileup program! It takes sam or bam, sorted or unsorted.

          pileup.sh in=mapped.sam out=stats.txt hist=histogram.txt

          stats.txt will contain the average depth and percent covered of each reference sequence; the histogram will contain the exact number of bases with a each coverage level. You can also get per-base coverage or binned coverage if you want to plot the coverage. It also generates median and standard deviation, and so forth.

          It's also possible to generate coverage directly from BBMap, without an intermediate sam file, like this:

          bbmap.sh in=reads.fq ref=reference.fasta nodisk covstats=stats.txt covhist=histogram.txt

          We use this a lot in situations where all you care about is coverage distributions, which is somewhat common in metagenome assemblies. It also supports most of the flags that pileup.sh supports, though the syntax is slightly different to prevent collisions. In each case you can see all the possible flags by running the shellscript with no arguments.

          P.S. I put some work into it last week and it is now over 3x as fast as it used to be, and it used to be pretty fast!
          Last edited by Brian Bushnell; 01-29-2015, 06:52 PM.

          Comment


          • #6
            Dear Brian Bushnell,

            Thanks a lot for the suggestion. I just ran BBMap on one of my sam files and the summary output looks like below.

            Average coverage: 9.75
            Percent scaffolds with any coverage: 100.00
            Percent of reference bases covered: 0.32

            I guess only 0.32 proportion of the reference genome was covered by reads at any coverage. Then what does " Percent scaffolds with any coverage" mean? Thanks for your help in advance.

            Comment


            • #7
              Wow - you had very uneven coverage.

              "Percent of scaffolds with any coverage" means that - well... let's assume you had a human reference genome, which has 25 chromosomes: 1-22, X, Y and M.

              In that case, if each of those 25 sequences had at least one read hit, then the percentage of scaffolds with coverage would be 100%. You can get more details in the per-scaffold coverage file to see what percent of each scaffold was covered... in general, for a complete genome, "scaffold" means "chromosome".

              0.32% refers to the percent of bases across the entire genome that had any coverage, and you can consult the histogram for more details. But essentially, (100% - 0.32%) of the genome had zero coverage. I assume this was a ChipSeq experiment or similar where the assumption is that 99.9% of the coverage falls upon 0.1% of the genome.

              Comment


              • #8
                Dear Brian Bushnell,

                Thanks a lot for the detailed explanation on this. Everything is now very clear.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Advancing Precision Medicine for Rare Diseases in Children
                  by seqadmin




                  Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                  12-16-2024, 07:57 AM
                • seqadmin
                  Recent Advances in Sequencing Technologies
                  by seqadmin



                  Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                  Long-Read Sequencing
                  Long-read sequencing has seen remarkable advancements,...
                  12-02-2024, 01:49 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 12-17-2024, 10:28 AM
                0 responses
                33 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 12-13-2024, 08:24 AM
                0 responses
                48 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 12-12-2024, 07:41 AM
                0 responses
                34 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 12-11-2024, 07:45 AM
                0 responses
                46 views
                0 likes
                Last Post seqadmin  
                Working...
                X