Seqanswers Leaderboard Ad

**jimmybee** · 04-01-2013, 12:16 AM

Bedtools can do it straight from the bam (genomeCoveragebed). Use the -d option for per-base coverage

**David.Shine** · 04-01-2013, 01:28 AM

samtools depth can also generate per-base coverage directly. the result simply consist of chromose coordinates and its depth, it is not too hard to draw a coverage plot with R in my opioin

**simonandrews** · 04-02-2013, 12:05 AM

SeqMonk can make these kinds of plots very quickly if you're happy with a more manual solution (see attached).

Attached Files

genome_coverage.png (30.9 KB, 522 views)

**suryasaha** · 04-02-2013, 08:31 AM

I'd recommend bedtools genomeCoverage over samtools depth for this purpose. See http://www.biostars.org/p/67579/ for discussion. Samtools skips secondary alignments and aberrant read pairs. Plots can, of course, be generated in R.

**kadircaner** · 08-02-2013, 10:05 AM

Simon, could you please explain little further how we can generate such a plot in SeqMonk? I tried to find in documentation but couldnt find anything... Thanks!

**simonandrews** · 08-04-2013, 11:55 PM

Originally posted by kadircaner View Post

Simon, could you please explain little further how we can generate such a plot in SeqMonk? I tried to find in documentation but couldnt find anything... Thanks!

The quickest way would be to simply load your data and then run Data > Quantitation Pipelines > Wiggle Plot

In the options you'd set the region to cover to be the whole genome, and if you want real coverage values then turn off the "total count correction" option. Depending on the scale of your data you could try with or without the log transformation.

Once you've done the quantitation you can select any dataset in the data view (top left) and it will be shown in the genome view, as in the plot I attached before. You can make the colours fixed by pressing the 4th button from the left in the toolbar, and you can save the image by going to File > Export Current View > Genome View.

Hope this helps

**kadircaner** · 08-16-2013, 08:44 AM

Thanks Simon!!!

**zhoujiayi** · 09-23-2013, 12:51 PM

Originally posted by simonandrews View Post

The quickest way would be to simply load your data and then run Data > Quantitation Pipelines > Wiggle Plot

In the options you'd set the region to cover to be the whole genome, and if you want real coverage values then turn off the "total count correction" option. Depending on the scale of your data you could try with or without the log transformation.

Once you've done the quantitation you can select any dataset in the data view (top left) and it will be shown in the genome view, as in the plot I attached before. You can make the colours fixed by pressing the 4th button from the left in the toolbar, and you can save the image by going to File > Export Current View > Genome View.

Hope this helps

Hi, Simon. I am new to SeqMonk and now I am using your method to get the genome coverage of my alignment file. The problem is for SeqMonk, I can only download genomes as GRCH37/NCBI34/35/36. However, the reference I used to do the alignment is HG19. Is it possible to load the genomes from my reference HG19 fa file? Or it doesn't matter, I can just choose any genome files, SeqMonk will give me the correct coverage?

**simonandrews** · 09-23-2013, 11:43 PM

Originally posted by zhoujiayi View Post

Hi, Simon. I am new to SeqMonk and now I am using your method to get the genome coverage of my alignment file. The problem is for SeqMonk, I can only download genomes as GRCH37/NCBI34/35/36. However, the reference I used to do the alignment is HG19. Is it possible to load the genomes from my reference HG19 fa file? Or it doesn't matter, I can just choose any genome files, SeqMonk will give me the correct coverage?

The NCBIXX/GRChXX vs hgXX all refer to exactly the same assemblies, it's just a nomenclature difference between different centres. As long as you pick the correct equivalent name they are completely interchangeable. NCBI36 is hg18 and GRCh37 is hg19.

**dariober** · 09-24-2013, 12:21 AM

Originally posted by medalofhonour View Post

Hi,

I wish to plot coverage profiles for each of the chromosomes from my BAM files which I obtained after aligning exome sequencing data (hg19).

I had initially thought of using samtools to generate pileup files, and then plot them however, that would take a long time and I guess that would be an inefficient way to go about it.

Are there any R/other packages that can quickly help me get these plots ?

Any suggestions would be much appreciated.

Hello- As an alternative solution... This one is command-line based and doesn't require to plot millions of datapoints. First, divide each chromosome in the human (or whatever) genome in windows of n bases, then count reads in each window. The output file is pretty small and easy to plot with R or anything else:

Code:

## Get human chromosomes, if you don't have them already
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e \
	"select chrom, size from hg19.chromInfo" > hg19.genome

## Make windows and count reads
bedtools makewindows -g hg19.genome -w 100000 \
    | coverageBed -abam myaln.bam -b - -counts \
    | sort -k 1,1 -k2,2n -k3,3n > myaln.counts.txt


## Output example:
cat myaln.counts.txt
chr1    0       100000  17
chr1    100000  200000  10
chr1    200000  300000  16
chr1    300000  400000  0
chr1    400000  500000  0
chr1    500000  600000  33
chr1    600000  700000  32
...

Dario

**sindrle** · 03-05-2014, 07:50 PM

When loading BAM into seqmonk I get 401980 warnings like this:

Reading position 195567590 was 95619bp beyond the end of chr1 (195471971)

What does that mean?

I trying to re-build my own genome GRCm38, just to check, it was the one I used for alignment.

**Brian Bushnell** · 03-05-2014, 08:32 PM

Originally posted by medalofhonour View Post

Hi,

I wish to plot coverage profiles for each of the chromosomes from my BAM files which I obtained after aligning exome sequencing data (hg19).

I had initially thought of using samtools to generate pileup files, and then plot them however, that would take a long time and I guess that would be an inefficient way to go about it.

Are there any R/other packages that can quickly help me get these plots ?

Any suggestions would be much appreciated.

Creating a bam, then sorting, then indexing it, then doing what you want is inefficient. If you have enough memory to store the complete genome (around 3 bytes per reference base), I recommend using pileup.sh (from the BBMap package). You can use bbmap.sh and pipe output to pileup.sh, which will generate a coverage file as the only output, skipping sam/bam files, and requiring no sorting (at the expense of using more memory - around 4 bytes per reference base). You can also use already generated sam files (and bam files if you have samtools installed).

For example:
pileup.sh in=samfile.sam out=coverage.txt
Please run pileup.sh with no arguments to display more options. The default gives the output that people at JGI usually want.

**simonandrews** · 03-06-2014, 01:50 AM

Originally posted by sindrle View Post

When loading BAM into seqmonk I get 401980 warnings like this:

Reading position 195567590 was 95619bp beyond the end of chr1 (195471971)

What does that mean?

I trying to re-build my own genome GRCm38, just to check, it was the one I used for alignment.

Those errors almost certainly mean that you've used a different version of the genome assembly to do your mapping to the one you've used to create your project. You should go back and check. If you have samtools installed on your machine then you can run samtools view -H some.bam to look at the headers, which will include the command used to generate the file, which should in turn contain the name of the assembly.

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

Plotting per chromosome coverage for BAM files

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News