Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Best programs/methods for looking at mapped coverage? (Mapping whole genome data)

    Greetings.

    I'm interested in looking, in detail, at the mapped coverage for NGS short read data onto a reference, using whole genome data. I'd like a good way to examine and quantify the coverage along a chromosome and compare this coverage among samples to look for areas with usually high- and low-coverage, and to see if these high- and low-coverage areas are common among samples. I know how to get basic mapped read data using SAMtools, and I can look at the data visually using Atermis, but I don't know how to do anything more sophisticated.

    Any help is much appreciated, cheers!

  • #2
    I suppose what you might be looking for is IGV, it does a very nice job of displaying coverage coming from an aligned bam file. It can show you things like SNPs/indels in your reads too.

    Comment


    • #3
      Since Genomics101 can already look at the data visually via Artemis (ACT) I am not sure if IGV would add much to his knowledge.

      I suggest exploring the use of the '-D' (and required '-u/-g') option in 'samtools mpileup'. Piped to bcftools that should give you the depth of coverage for each sample at each base. A bit of parsing into spreadsheet form and you could do further manipulations in Excel, et.al.

      Comment


      • #4
        You should check out QualiMap, which is useful to get an overview of BAM file:


        It computes plenty of useful quality metrics such as mean and std coverage, coverage per chromosome and others. Additionally it produces a number of plots including coverage histogram and coverage across reference. By request it can output per-base coverage.

        Here is an example report:

        Comment


        • #5
          You could:

          1. Map your reads to the reference with bwa
          2. Turn the sam file into a bam file with samtools
          3. Visualize with tablet
          savetherhino.org

          Comment


          • #6
            It appears that Genomics101 is interested in getting quantitative/qualitative differential coverage data for samples (not just a visual inspection). Trying to think if there is anything available that can do this without custom code.

            Comment


            • #7
              I suspect some custom code will have to be written but the code, aside from samtools and bcf, should be all unix tools. I suspect that 'grep,' 'cut' and perhaps 'paste' will be the tools required. Nothing that a even a rookie bioinformatics person should find difficult. While a bit awkward I think that the following will tease out the depths into a TSV file.

              Code:
              # Get chromosome positions and per-sample coverage (depth)
              
              samtools mpileup -D -u file1.sorted.bam file2.sorted.bam file3.sorted.bam | bcftools view - | grep -v '#' | cut -f 1,2,10-12 > bam.tmp
              
              # Just extract chromosome positions
              cut -f 1,2 bam.tmp > position.tmp
              
              # Get the depth part of the samples
              for i in `seq 3 5`
                do 
                cut -f $i bam.tmp | cut -f 2 -d ':' > depth_$i.tmp
              done
              
              # Put it together into a spreadsheet
              paste position.tmp depth_*.tmp > results.tsv
              The results I got on my test file (and I expect high coverage from this experiment) look like:

              Code:
              S1      1       1802    1264    2665
              S1      2       1811    1267    2666
              S1      3       1812    1267    2667
              S1      4       1812    1267    2668
              S1      5       1812    1268    2668
              S1      6       1811    1266    2661
              S1      7       1812    1267    2667
              S1      8       1811    1265    2665
              S1      9       1811    1266    2669
              S1      10      1810    1263    2662
              Last edited by westerman; 07-22-2014, 09:48 AM. Reason: Ah, first try did not work. Need a 'for' (bash). Remember you are getting what you pay for. :-)

              Comment


              • #8
                Geneious (which is commercial software) has nice coverage visualization. You can also run the high/low coverage finder on each sample to annotate regions of high/low coverage. Then run the compare annotations tool to compare the results between samples which will annotate regions in one sample that have high/low coverage not present in other samples. The video at https://www.youtube.com/watch?v=IOGmxjK3f_4 demonstrates some of this starting from around 30 seconds into it.

                Comment


                • #9
                  Altough I did not intent do cover whole chromosomes, you may have a look at my 'rbamtools' package.
                  There is a alignDepth function with which you can calculate coverage values as numerical values and may plot the values (There's a new plot function in the next release...)

                  Comment


                  • #10
                    What you're describing is basically the same problem as ChIP-seq peak calling from multiple samples. I wrote a program called UniPeak that does this very straightforwardly (doi:10.1186/1471-2164-14-720). It uses Epanechnikov kernel smoothing rather than base coverage, but if you think about it, coverage is mathematically equivalent to kernel smoothing with a rectangular kernel. So you could try it with Epanechnikov smoothing (if anything this might actually give you better results), or change the kernel function to a rectangle and recompile (PM me if you need help with that).

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM
                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    25 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    28 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    24 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-04-2024, 09:00 AM
                    0 responses
                    52 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X