Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • a basic question about coverage

    Hi everybody,
    I have a basic question in NGS area
    how can we calculate sequencing coverage (5X, 20X ...) at selected regions of interest? and what does it exactly mean?
    It is calculated after sequencing and based on fastq file or after mapping to the genome?

  • #2
    Hello,

    you need to map the reads first to know from what region they (hopefully) comes from.

    One easy way to look for coverage in regions is to design a .bed file (http://genome.ucsc.edu/FAQ/FAQformat.html#format1) with your regions of interest and compare them to the mapped result with bedtools coverageBed (http://code.google.com/p/bedtools/).

    Comment


    • #3
      Once you have coverage in terms of read count (from coverageBed), to get coverage like 5x, you'll have to do

      ( read count * read length ) / length of area in question

      So if you have 5 reads that are 50 bases long in a region that's 100 bases long, your coverage will be

      (5 * 50) / 100 = 2.5x.

      You could calculate a number for the whole genome by adding all the chromosome lengths, or you could do individual chromosomes or genes or windows throughout the genome or whatever is interesting.

      Comment


      • #4
        You could also use "samtools depth" to find the coverage at each position in your target region which like mamons said, you could create a bed file for that.

        Then you can just take all the coverage counts and get a summary using R to get the mean, median etc. depth of coverage in the region of interest

        Comment


        • #5
          Hi,

          has someone of you ever tried to calculate the coverage (using coverageBed) of a whole genome sequencing experiment (about 40x average coverage) on a relatively large amount of genomic features (like all refSeq genes). I tried to perform this task on a multicore processor and 16 GB Ram memory. After three days of calculation and a constant memory consumption of about 14 GB i stopped the process. I used the following command:

          ./coverageBed -abam <bam_file> -b <refSeq_exons.bed6> -hist >> histogram.txt

          Is that normal? Some ideas?

          @aggp11: Which version of samtools are you using? The command depth seems not to be present in my version

          Comment


          • #6
            Hi




            * Added the `depth' command to samtools to compute the per-base depth with a
            simpler interface. File `bam2depth.c', which implements this command, is the
            recommended example on how to use the mpileup APIs.

            Comment


            • #7
              @Mbender: I am using samtools version 0.1.18 .

              @maria_maria & Mbender: with this latest version of samtools, you don't even have to worry about bam2depth. The following command is an example of how samtools depth works:

              samtools depth -q 30 -b exons.bed exome.bam > test_q_20.coverage

              Output:
              chr1 14468 39
              chr1 14469 39
              chr1 14470 37
              chr1 14471 39
              chr1 14472 35
              chr1 14473 34

              Where the third column is the # of q30 or more reads at the given position.

              Thanks,
              Praful

              Comment


              • #8
                Many thanks.

                Using samtools depth seems to calculate the coverage in given genomic regions in a feasible amount of time. By the way, the low performance of coverageBed when working on a large amount of genomic intervals is a known issue.



                Best,

                Matthias

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM
                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 06:37 PM
                0 responses
                10 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, Yesterday, 06:07 PM
                0 responses
                9 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-22-2024, 10:03 AM
                0 responses
                49 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-21-2024, 07:32 AM
                0 responses
                67 views
                0 likes
                Last Post seqadmin  
                Working...
                X