SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
What is the meaning density in CummeRbund plots? Carlos Borroto Bioinformatics 16 12-29-2016 09:05 PM
Agilent human exon coverage plots SMO Genomic Resequencing 9 11-25-2012 06:14 AM
How to make great wiggle plots and Seq figures cgregg Bioinformatics 5 08-08-2012 11:41 PM
Coverage plots for paired end data swebb Bioinformatics 0 12-12-2011 01:59 AM
tag density plots bogdan RNA Sequencing 0 12-03-2010 10:44 PM

Reply
 
Thread Tools
Old 08-12-2010, 08:09 AM   #1
zlu
Member
 
Location: UK

Join Date: Nov 2008
Posts: 32
Default overlaying coverage plots

I'm trying to overlay coverage plots of individual chromosomes from different experiments to get a quick overview of probable CNVs. I've tried using simple xy plot, ggplot and plotrix packages of R (and I'm a real novice in R) but it seems that my linux machine with 64GB of memory is unable to handle the task. I've also reduced my file size by putting only the coordinates and the coverages derived from samtools pileup into a single file.

Can someone comment on this and suggest a better and more memory efficient way of doing this? Thank you.
zlu is offline   Reply With Quote
Old 08-12-2010, 05:34 PM   #2
robsyme
Junior Member
 
Location: Perth, Western Australia

Join Date: Jan 2009
Posts: 6
Default

I've found Hilbert Plots very helpful for chromosome coverage at a glance. Try http://www.bioconductor.org/packages...ilbertVis.html
Bioconductor can encode coverage with efficient run length encoding. Your massive 64GB will be fine.
-r
robsyme is offline   Reply With Quote
Old 08-13-2010, 03:21 AM   #3
henry.wood
Member
 
Location: Leeds, UK

Join Date: Apr 2010
Posts: 63
Default

Are you doing the whole thing in R? I've been doing similar things and I've found it's a lot quicker getting the data ready in python or perl first before using R's plotting functions.
I extract a simple list of start positions of each read from the SAM file and sort them by chromosome and position. Then I split the genome up into windows of either 50/100/500 Kb etc or 50/100/200 reads and make a file a line for each window and columns for chromosome, start, end, number of test reads and number of normal reads. I then import this file into R, and the plotting is much more painless.
henry.wood is offline   Reply With Quote
Old 08-14-2010, 06:10 AM   #4
zlu
Member
 
Location: UK

Join Date: Nov 2008
Posts: 32
Default

Rosyme, Thanks for suggesting the HikbertVis package. It seems to plot the coverage without any problem. However, do you know how to adjust the scale on the y-axis? I have some exceptionally high coverage which skew the whole plot and ylim is not working.

Henry, how do you decide which may be the best window size to use? Do you mind sharing your script? I was actually thinking about doing something similar to the maq cns2win.

By the way, I think overlaying is probably not the right word but what I really want to do is superimposing one plot on top of another.

Last edited by zlu; 08-14-2010 at 06:12 AM.
zlu is offline   Reply With Quote
Old 08-14-2010, 06:15 PM   #5
adamdeluca
Member
 
Location: Iowa City, IA

Join Date: Jul 2010
Posts: 95
Default

For a quick overview you can always upload a wig/bigwig file to the ucsc browser.
adamdeluca is offline   Reply With Quote
Old 08-23-2010, 01:49 AM   #6
henry.wood
Member
 
Location: Leeds, UK

Join Date: Apr 2010
Posts: 63
Default

The best window size is one of those 'how long is a piece of string' questions. It's a signal versus noise question. If I want beautiful plots to put into a talk or impress my boss I use windows of 400 reads. If I want to see small deletions or amplifications I go down to 200 or 100, but the graphs look a bit messier. I tend to feed the data into the DNAcopy package from bioconductor. Using simulated data, it can pick up events using windows of 20 reads, even though the actual graph looks like a random mass of dots.
My script is currently embarrassing. It's the first one I ever wrote and is a bit of an unholy mess. It needs to be manually installed onto a computer to work and uses my wife's birthday to know when to stop because I didn't know how to end a for loop. Is there any part of it you need and I might try and tidy it up. I have a colleague who is currently preparing a proper statistical package to do all this better than I ever could.
henry.wood is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:24 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO