SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Merging reads mapped on genome and CDS (SOLID data) jmandel Bioinformatics 1 01-04-2012 07:34 AM
Programs for 454 data sequence alignment and mapping Abishai3911 454 Pyrosequencing 0 06-30-2011 03:12 PM
PubMed: Optical mapping of DNA: Single-molecule-based methods for mapping genomes. Newsbot! Literature Watch 0 01-06-2011 03:00 AM
Why do we use mapping programs instead of blast for mapping to a reference? thsuk1 Bioinformatics 6 08-27-2010 09:54 AM

Reply
 
Thread Tools
Old 07-22-2014, 08:19 AM   #1
Genomics101
Member
 
Location: Maryland, USA

Join Date: May 2012
Posts: 60
Default Best programs/methods for looking at mapped coverage? (Mapping whole genome data)

Greetings.

I'm interested in looking, in detail, at the mapped coverage for NGS short read data onto a reference, using whole genome data. I'd like a good way to examine and quantify the coverage along a chromosome and compare this coverage among samples to look for areas with usually high- and low-coverage, and to see if these high- and low-coverage areas are common among samples. I know how to get basic mapped read data using SAMtools, and I can look at the data visually using Atermis, but I don't know how to do anything more sophisticated.

Any help is much appreciated, cheers!
Genomics101 is offline   Reply With Quote
Old 07-22-2014, 08:31 AM   #2
Wallysb01
Senior Member
 
Location: San Francisco, CA

Join Date: Feb 2011
Posts: 286
Default

I suppose what you might be looking for is IGV, it does a very nice job of displaying coverage coming from an aligned bam file. It can show you things like SNPs/indels in your reads too.
Wallysb01 is offline   Reply With Quote
Old 07-22-2014, 08:46 AM   #3
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

Since Genomics101 can already look at the data visually via Artemis (ACT) I am not sure if IGV would add much to his knowledge.

I suggest exploring the use of the '-D' (and required '-u/-g') option in 'samtools mpileup'. Piped to bcftools that should give you the depth of coverage for each sample at each base. A bit of parsing into spreadsheet form and you could do further manipulations in Excel, et.al.
westerman is offline   Reply With Quote
Old 07-22-2014, 09:32 AM   #4
kokonech
Curious Character
 
Location: Berlin, Germany

Join Date: Sep 2010
Posts: 13
Default

You should check out QualiMap, which is useful to get an overview of BAM file:
http://qualimap.bioinfo.cipf.es/

It computes plenty of useful quality metrics such as mean and std coverage, coverage per chromosome and others. Additionally it produces a number of plots including coverage histogram and coverage across reference. By request it can output per-base coverage.

Here is an example report:
http://qualimap.bioinfo.cipf.es/samp...mapReport.html
kokonech is offline   Reply With Quote
Old 07-22-2014, 10:02 AM   #5
rhinoceros
Senior Member
 
Location: sub-surface moon base

Join Date: Apr 2013
Posts: 372
Default

You could:

1. Map your reads to the reference with bwa
2. Turn the sam file into a bam file with samtools
3. Visualize with tablet
__________________
savetherhino.org
rhinoceros is offline   Reply With Quote
Old 07-22-2014, 10:28 AM   #6
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,079
Default

It appears that Genomics101 is interested in getting quantitative/qualitative differential coverage data for samples (not just a visual inspection). Trying to think if there is anything available that can do this without custom code.
GenoMax is offline   Reply With Quote
Old 07-22-2014, 10:43 AM   #7
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

I suspect some custom code will have to be written but the code, aside from samtools and bcf, should be all unix tools. I suspect that 'grep,' 'cut' and perhaps 'paste' will be the tools required. Nothing that a even a rookie bioinformatics person should find difficult. While a bit awkward I think that the following will tease out the depths into a TSV file.

Code:
# Get chromosome positions and per-sample coverage (depth)

samtools mpileup -D -u file1.sorted.bam file2.sorted.bam file3.sorted.bam | bcftools view - | grep -v '#' | cut -f 1,2,10-12 > bam.tmp

# Just extract chromosome positions
cut -f 1,2 bam.tmp > position.tmp

# Get the depth part of the samples
for i in `seq 3 5`
  do 
  cut -f $i bam.tmp | cut -f 2 -d ':' > depth_$i.tmp
done

# Put it together into a spreadsheet
paste position.tmp depth_*.tmp > results.tsv
The results I got on my test file (and I expect high coverage from this experiment) look like:

Code:
S1      1       1802    1264    2665
S1      2       1811    1267    2666
S1      3       1812    1267    2667
S1      4       1812    1267    2668
S1      5       1812    1268    2668
S1      6       1811    1266    2661
S1      7       1812    1267    2667
S1      8       1811    1265    2665
S1      9       1811    1266    2669
S1      10      1810    1263    2662

Last edited by westerman; 07-22-2014 at 10:48 AM. Reason: Ah, first try did not work. Need a 'for' (bash). Remember you are getting what you pay for. :-)
westerman is offline   Reply With Quote
Old 07-22-2014, 04:31 PM   #8
Matt Kearse
Member
 
Location: New Zealand

Join Date: Mar 2014
Posts: 20
Default

Geneious (which is commercial software) has nice coverage visualization. You can also run the high/low coverage finder on each sample to annotate regions of high/low coverage. Then run the compare annotations tool to compare the results between samples which will annotate regions in one sample that have high/low coverage not present in other samples. The video at https://www.youtube.com/watch?v=IOGmxjK3f_4 demonstrates some of this starting from around 30 seconds into it.
Matt Kearse is offline   Reply With Quote
Old 07-24-2014, 03:21 AM   #9
wokai001
Member
 
Location: Düsseldorf

Join Date: Nov 2010
Posts: 20
Default

Altough I did not intent do cover whole chromosomes, you may have a look at my 'rbamtools' package.
There is a alignDepth function with which you can calculate coverage values as numerical values and may plot the values (There's a new plot function in the next release...)
wokai001 is offline   Reply With Quote
Old 07-24-2014, 08:20 AM   #10
jwfoley
Senior Member
 
Location: Stanford

Join Date: Jun 2009
Posts: 181
Default

What you're describing is basically the same problem as ChIP-seq peak calling from multiple samples. I wrote a program called UniPeak that does this very straightforwardly (doi:10.1186/1471-2164-14-720). It uses Epanechnikov kernel smoothing rather than base coverage, but if you think about it, coverage is mathematically equivalent to kernel smoothing with a rectangular kernel. So you could try it with Epanechnikov smoothing (if anything this might actually give you better results), or change the kernel function to a rectangle and recompile (PM me if you need help with that).
jwfoley is offline   Reply With Quote
Reply

Tags
mapping coverage

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:34 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO