Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Genomics101
    Member
    • May 2012
    • 60

    Best programs/methods for looking at mapped coverage? (Mapping whole genome data)

    Greetings.

    I'm interested in looking, in detail, at the mapped coverage for NGS short read data onto a reference, using whole genome data. I'd like a good way to examine and quantify the coverage along a chromosome and compare this coverage among samples to look for areas with usually high- and low-coverage, and to see if these high- and low-coverage areas are common among samples. I know how to get basic mapped read data using SAMtools, and I can look at the data visually using Atermis, but I don't know how to do anything more sophisticated.

    Any help is much appreciated, cheers!
  • Wallysb01
    Senior Member
    • Feb 2011
    • 286

    #2
    I suppose what you might be looking for is IGV, it does a very nice job of displaying coverage coming from an aligned bam file. It can show you things like SNPs/indels in your reads too.

    Comment

    • westerman
      Rick Westerman
      • Jun 2008
      • 1104

      #3
      Since Genomics101 can already look at the data visually via Artemis (ACT) I am not sure if IGV would add much to his knowledge.

      I suggest exploring the use of the '-D' (and required '-u/-g') option in 'samtools mpileup'. Piped to bcftools that should give you the depth of coverage for each sample at each base. A bit of parsing into spreadsheet form and you could do further manipulations in Excel, et.al.

      Comment

      • kokonech
        Curious Character
        • Sep 2010
        • 13

        #4
        You should check out QualiMap, which is useful to get an overview of BAM file:


        It computes plenty of useful quality metrics such as mean and std coverage, coverage per chromosome and others. Additionally it produces a number of plots including coverage histogram and coverage across reference. By request it can output per-base coverage.

        Here is an example report:

        Comment

        • rhinoceros
          Senior Member
          • Apr 2013
          • 372

          #5
          You could:

          1. Map your reads to the reference with bwa
          2. Turn the sam file into a bam file with samtools
          3. Visualize with tablet
          savetherhino.org

          Comment

          • GenoMax
            Senior Member
            • Feb 2008
            • 7142

            #6
            It appears that Genomics101 is interested in getting quantitative/qualitative differential coverage data for samples (not just a visual inspection). Trying to think if there is anything available that can do this without custom code.

            Comment

            • westerman
              Rick Westerman
              • Jun 2008
              • 1104

              #7
              I suspect some custom code will have to be written but the code, aside from samtools and bcf, should be all unix tools. I suspect that 'grep,' 'cut' and perhaps 'paste' will be the tools required. Nothing that a even a rookie bioinformatics person should find difficult. While a bit awkward I think that the following will tease out the depths into a TSV file.

              Code:
              # Get chromosome positions and per-sample coverage (depth)
              
              samtools mpileup -D -u file1.sorted.bam file2.sorted.bam file3.sorted.bam | bcftools view - | grep -v '#' | cut -f 1,2,10-12 > bam.tmp
              
              # Just extract chromosome positions
              cut -f 1,2 bam.tmp > position.tmp
              
              # Get the depth part of the samples
              for i in `seq 3 5`
                do 
                cut -f $i bam.tmp | cut -f 2 -d ':' > depth_$i.tmp
              done
              
              # Put it together into a spreadsheet
              paste position.tmp depth_*.tmp > results.tsv
              The results I got on my test file (and I expect high coverage from this experiment) look like:

              Code:
              S1      1       1802    1264    2665
              S1      2       1811    1267    2666
              S1      3       1812    1267    2667
              S1      4       1812    1267    2668
              S1      5       1812    1268    2668
              S1      6       1811    1266    2661
              S1      7       1812    1267    2667
              S1      8       1811    1265    2665
              S1      9       1811    1266    2669
              S1      10      1810    1263    2662
              Last edited by westerman; 07-22-2014, 09:48 AM. Reason: Ah, first try did not work. Need a 'for' (bash). Remember you are getting what you pay for. :-)

              Comment

              • Matt Kearse
                Member
                • Mar 2014
                • 20

                #8
                Geneious (which is commercial software) has nice coverage visualization. You can also run the high/low coverage finder on each sample to annotate regions of high/low coverage. Then run the compare annotations tool to compare the results between samples which will annotate regions in one sample that have high/low coverage not present in other samples. The video at https://www.youtube.com/watch?v=IOGmxjK3f_4 demonstrates some of this starting from around 30 seconds into it.

                Comment

                • wokai001
                  Member
                  • Nov 2010
                  • 20

                  #9
                  Altough I did not intent do cover whole chromosomes, you may have a look at my 'rbamtools' package.
                  There is a alignDepth function with which you can calculate coverage values as numerical values and may plot the values (There's a new plot function in the next release...)

                  Comment

                  • jwfoley
                    Senior Member
                    • Jun 2009
                    • 183

                    #10
                    What you're describing is basically the same problem as ChIP-seq peak calling from multiple samples. I wrote a program called UniPeak that does this very straightforwardly (doi:10.1186/1471-2164-14-720). It uses Epanechnikov kernel smoothing rather than base coverage, but if you think about it, coverage is mathematically equivalent to kernel smoothing with a rectangular kernel. So you could try it with Epanechnikov smoothing (if anything this might actually give you better results), or change the kernel function to a rectangle and recompile (PM me if you need help with that).

                    Comment

                    Latest Articles

                    Collapse

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by SEQadmin2, 06-05-2026, 10:09 AM
                    0 responses
                    16 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-04-2026, 08:59 AM
                    0 responses
                    34 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-02-2026, 12:03 PM
                    0 responses
                    37 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-02-2026, 11:40 AM
                    0 responses
                    24 views
                    0 reactions
                    Last Post SEQadmin2  
                    Working...