Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • a basic question about coverage

    Hi everybody,
    I have a basic question in NGS area
    how can we calculate sequencing coverage (5X, 20X ...) at selected regions of interest? and what does it exactly mean?
    It is calculated after sequencing and based on fastq file or after mapping to the genome?

  • #2
    Hello,

    you need to map the reads first to know from what region they (hopefully) comes from.

    One easy way to look for coverage in regions is to design a .bed file (http://genome.ucsc.edu/FAQ/FAQformat.html#format1) with your regions of interest and compare them to the mapped result with bedtools coverageBed (http://code.google.com/p/bedtools/).

    Comment


    • #3
      Once you have coverage in terms of read count (from coverageBed), to get coverage like 5x, you'll have to do

      ( read count * read length ) / length of area in question

      So if you have 5 reads that are 50 bases long in a region that's 100 bases long, your coverage will be

      (5 * 50) / 100 = 2.5x.

      You could calculate a number for the whole genome by adding all the chromosome lengths, or you could do individual chromosomes or genes or windows throughout the genome or whatever is interesting.

      Comment


      • #4
        You could also use "samtools depth" to find the coverage at each position in your target region which like mamons said, you could create a bed file for that.

        Then you can just take all the coverage counts and get a summary using R to get the mean, median etc. depth of coverage in the region of interest

        Comment


        • #5
          Hi,

          has someone of you ever tried to calculate the coverage (using coverageBed) of a whole genome sequencing experiment (about 40x average coverage) on a relatively large amount of genomic features (like all refSeq genes). I tried to perform this task on a multicore processor and 16 GB Ram memory. After three days of calculation and a constant memory consumption of about 14 GB i stopped the process. I used the following command:

          ./coverageBed -abam <bam_file> -b <refSeq_exons.bed6> -hist >> histogram.txt

          Is that normal? Some ideas?

          @aggp11: Which version of samtools are you using? The command depth seems not to be present in my version

          Comment


          • #6
            Hi




            * Added the `depth' command to samtools to compute the per-base depth with a
            simpler interface. File `bam2depth.c', which implements this command, is the
            recommended example on how to use the mpileup APIs.

            Comment


            • #7
              @Mbender: I am using samtools version 0.1.18 .

              @maria_maria & Mbender: with this latest version of samtools, you don't even have to worry about bam2depth. The following command is an example of how samtools depth works:

              samtools depth -q 30 -b exons.bed exome.bam > test_q_20.coverage

              Output:
              chr1 14468 39
              chr1 14469 39
              chr1 14470 37
              chr1 14471 39
              chr1 14472 35
              chr1 14473 34

              Where the third column is the # of q30 or more reads at the given position.

              Thanks,
              Praful

              Comment


              • #8
                Many thanks.

                Using samtools depth seems to calculate the coverage in given genomic regions in a feasible amount of time. By the way, the low performance of coverageBed when working on a large amount of genomic intervals is a known issue.



                Best,

                Matthias

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Advancing Precision Medicine for Rare Diseases in Children
                  by seqadmin




                  Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                  12-16-2024, 07:57 AM
                • seqadmin
                  Recent Advances in Sequencing Technologies
                  by seqadmin



                  Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                  Long-Read Sequencing
                  Long-read sequencing has seen remarkable advancements,...
                  12-02-2024, 01:49 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 12-17-2024, 10:28 AM
                0 responses
                33 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 12-13-2024, 08:24 AM
                0 responses
                49 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 12-12-2024, 07:41 AM
                0 responses
                34 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 12-11-2024, 07:45 AM
                0 responses
                46 views
                0 likes
                Last Post seqadmin  
                Working...
                X