Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • maria_mari
    Member
    • Jan 2012
    • 17

    a basic question about coverage

    Hi everybody,
    I have a basic question in NGS area
    how can we calculate sequencing coverage (5X, 20X ...) at selected regions of interest? and what does it exactly mean?
    It is calculated after sequencing and based on fastq file or after mapping to the genome?
  • mamons
    Member
    • Nov 2011
    • 10

    #2
    Hello,

    you need to map the reads first to know from what region they (hopefully) comes from.

    One easy way to look for coverage in regions is to design a .bed file (http://genome.ucsc.edu/FAQ/FAQformat.html#format1) with your regions of interest and compare them to the mapped result with bedtools coverageBed (http://code.google.com/p/bedtools/).

    Comment

    • mgogol
      Senior Member
      • Mar 2008
      • 197

      #3
      Once you have coverage in terms of read count (from coverageBed), to get coverage like 5x, you'll have to do

      ( read count * read length ) / length of area in question

      So if you have 5 reads that are 50 bases long in a region that's 100 bases long, your coverage will be

      (5 * 50) / 100 = 2.5x.

      You could calculate a number for the whole genome by adding all the chromosome lengths, or you could do individual chromosomes or genes or windows throughout the genome or whatever is interesting.

      Comment

      • aggp11
        Member
        • Jun 2011
        • 87

        #4
        You could also use "samtools depth" to find the coverage at each position in your target region which like mamons said, you could create a bed file for that.

        Then you can just take all the coverage counts and get a summary using R to get the mean, median etc. depth of coverage in the region of interest

        Comment

        • MBender
          Junior Member
          • Jan 2011
          • 9

          #5
          Hi,

          has someone of you ever tried to calculate the coverage (using coverageBed) of a whole genome sequencing experiment (about 40x average coverage) on a relatively large amount of genomic features (like all refSeq genes). I tried to perform this task on a multicore processor and 16 GB Ram memory. After three days of calculation and a constant memory consumption of about 14 GB i stopped the process. I used the following command:

          ./coverageBed -abam <bam_file> -b <refSeq_exons.bed6> -hist >> histogram.txt

          Is that normal? Some ideas?

          @aggp11: Which version of samtools are you using? The command depth seems not to be present in my version

          Comment

          • maria_mari
            Member
            • Jan 2012
            • 17

            #6
            Hi




            * Added the `depth' command to samtools to compute the per-base depth with a
            simpler interface. File `bam2depth.c', which implements this command, is the
            recommended example on how to use the mpileup APIs.

            Comment

            • aggp11
              Member
              • Jun 2011
              • 87

              #7
              @Mbender: I am using samtools version 0.1.18 .

              @maria_maria & Mbender: with this latest version of samtools, you don't even have to worry about bam2depth. The following command is an example of how samtools depth works:

              samtools depth -q 30 -b exons.bed exome.bam > test_q_20.coverage

              Output:
              chr1 14468 39
              chr1 14469 39
              chr1 14470 37
              chr1 14471 39
              chr1 14472 35
              chr1 14473 34

              Where the third column is the # of q30 or more reads at the given position.

              Thanks,
              Praful

              Comment

              • MBender
                Junior Member
                • Jan 2011
                • 9

                #8
                Many thanks.

                Using samtools depth seems to calculate the coverage in given genomic regions in a feasible amount of time. By the way, the low performance of coverageBed when working on a large amount of genomic intervals is a known issue.



                Best,

                Matthias

                Comment

                Latest Articles

                Collapse

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by SEQadmin2, Today, 11:58 AM
                0 responses
                6 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-05-2026, 10:09 AM
                0 responses
                23 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-04-2026, 08:59 AM
                0 responses
                34 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-02-2026, 12:03 PM
                0 responses
                55 views
                0 reactions
                Last Post SEQadmin2  
                Working...