SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Skewed data points / Normalization question glados Bioinformatics 0 08-26-2012 02:21 PM
RNA seq with two time points anle Bioinformatics 0 01-16-2012 12:24 AM
DEGseq MA-plot red points Zhe Bioinformatics 1 01-21-2011 09:04 AM
Dealing with snps at contigs junction points Protaeus Bioinformatics 0 12-17-2010 03:53 PM
rearreangment points isomer 454 Pyrosequencing 0 03-13-2010 10:30 AM

Reply
 
Thread Tools
Old 01-18-2017, 08:03 AM   #1
Jon17
Member
 
Location: Indiana

Join Date: Jun 2016
Posts: 15
Default plotting 5 million points in R

I plotted 5 million points in R. Is there a way to clean this up? Make it easier to spot patterns?

Jon17 is offline   Reply With Quote
Old 01-18-2017, 08:11 AM   #2
Jon17
Member
 
Location: Indiana

Join Date: Jun 2016
Posts: 15
Default

I'd like to reproduce this but don't know how:

Jon17 is offline   Reply With Quote
Old 01-18-2017, 11:03 AM   #3
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

I would recommend using something like geom_tile() or geom_raster() in ggplot2 rather than plotting the actual points. That will make it vastly simpler to spot trends.
dpryan is offline   Reply With Quote
Old 01-18-2017, 11:29 AM   #4
Jon17
Member
 
Location: Indiana

Join Date: Jun 2016
Posts: 15
Default

geom_raster looks really nice!

Unfortunately I only have 2 data columns. Looks like geom_raster requires 3 columns? the faithfuld plot used in the example link below has 3 columns in it,

1) waiting
2) eruptions
3) density

all 3 are used to create the plot. Can you apply this to 2 column data?

http://docs.ggplot2.org/current/geom_tile.html
Jon17 is offline   Reply With Quote
Old 01-18-2017, 12:15 PM   #5
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Ah, right, try geom_density_2d() instead.
dpryan is offline   Reply With Quote
Old 01-18-2017, 06:25 PM   #6
Jon17
Member
 
Location: Indiana

Join Date: Jun 2016
Posts: 15
Default

Thanks for the tip, but now I'm having the opposite problem. The plots are too spars. Either the whole plot is colored or almost nothing at all. :-p

Jon17 is offline   Reply With Quote
Old 01-18-2017, 07:18 PM   #7
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Maybe this is too obvious, but have you tried randomly subsampling?
Brian Bushnell is offline   Reply With Quote
Old 01-18-2017, 07:35 PM   #8
Jon17
Member
 
Location: Indiana

Join Date: Jun 2016
Posts: 15
Default

That worked.... thanks! duh... :-)

I read threads of people plotting millions of datapoints so I just assumed R could handle it easily. Looks like 100k to 1 M is the perfect range for pch='.'
Jon17 is offline   Reply With Quote
Old 01-19-2017, 02:56 AM   #9
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 836
Default

You can also use smoothscatter, which plots a smoothed contour plus points for any outliers
gringer is offline   Reply With Quote
Old 01-19-2017, 05:39 AM   #10
thermophile
Senior Member
 
Location: CT

Join Date: Apr 2015
Posts: 242
Default

If you want to plot the actual points, play with alpha (transparancy) using one of the solid shapes (15-20). R can certainly handle it just be aware that the file will be huge.
__________________
Microbial ecologist, running a sequencing core. I have lots of strong opinions on how to survey communities, pretty sure some are even correct.
thermophile is offline   Reply With Quote
Old 01-19-2017, 09:42 AM   #11
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 836
Default

Quote:
Originally Posted by thermophile View Post
R can certainly handle it just be aware that the file will be huge.
If you output a PNG file rather than an SVG or PDF file, then the output file will not be huge. It just won't have as much information in it, and won't be infinitely scalable.
gringer is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:34 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO