SEQanswers plotting 5 million points in R
 Register FAQ Members List Calendar Search Today's Posts Mark Forums Read

 Similar Threads Thread Thread Starter Forum Replies Last Post glados Bioinformatics 0 08-26-2012 02:21 PM anle Bioinformatics 0 01-16-2012 12:24 AM Zhe Bioinformatics 1 01-21-2011 09:04 AM Protaeus Bioinformatics 0 12-17-2010 03:53 PM isomer 454 Pyrosequencing 0 03-13-2010 10:30 AM

 01-18-2017, 08:03 AM #1 Jon17 Member   Location: Indiana Join Date: Jun 2016 Posts: 15 plotting 5 million points in R I plotted 5 million points in R. Is there a way to clean this up? Make it easier to spot patterns?
 01-18-2017, 08:11 AM #2 Jon17 Member   Location: Indiana Join Date: Jun 2016 Posts: 15 I'd like to reproduce this but don't know how:
 01-18-2017, 11:03 AM #3 dpryan Devon Ryan   Location: Freiburg, Germany Join Date: Jul 2011 Posts: 3,480 I would recommend using something like geom_tile() or geom_raster() in ggplot2 rather than plotting the actual points. That will make it vastly simpler to spot trends.
 01-18-2017, 11:29 AM #4 Jon17 Member   Location: Indiana Join Date: Jun 2016 Posts: 15 geom_raster looks really nice! Unfortunately I only have 2 data columns. Looks like geom_raster requires 3 columns? the faithfuld plot used in the example link below has 3 columns in it, 1) waiting 2) eruptions 3) density all 3 are used to create the plot. Can you apply this to 2 column data? http://docs.ggplot2.org/current/geom_tile.html
 01-18-2017, 12:15 PM #5 dpryan Devon Ryan   Location: Freiburg, Germany Join Date: Jul 2011 Posts: 3,480 Ah, right, try geom_density_2d() instead.
 01-18-2017, 06:25 PM #6 Jon17 Member   Location: Indiana Join Date: Jun 2016 Posts: 15 Thanks for the tip, but now I'm having the opposite problem. The plots are too spars. Either the whole plot is colored or almost nothing at all. :-p
 01-18-2017, 07:18 PM #7 Brian Bushnell Super Moderator   Location: Walnut Creek, CA Join Date: Jan 2014 Posts: 2,707 Maybe this is too obvious, but have you tried randomly subsampling?
 01-18-2017, 07:35 PM #8 Jon17 Member   Location: Indiana Join Date: Jun 2016 Posts: 15 That worked.... thanks! duh... :-) I read threads of people plotting millions of datapoints so I just assumed R could handle it easily. Looks like 100k to 1 M is the perfect range for pch='.'
 01-19-2017, 02:56 AM #9 gringer David Eccles (gringer)   Location: Wellington, New Zealand Join Date: May 2011 Posts: 836 You can also use smoothscatter, which plots a smoothed contour plus points for any outliers
 01-19-2017, 05:39 AM #10 thermophile Senior Member   Location: CT Join Date: Apr 2015 Posts: 242 If you want to plot the actual points, play with alpha (transparancy) using one of the solid shapes (15-20). R can certainly handle it just be aware that the file will be huge. __________________ Microbial ecologist, running a sequencing core. I have lots of strong opinions on how to survey communities, pretty sure some are even correct.
01-19-2017, 09:42 AM   #11
gringer
David Eccles (gringer)

Location: Wellington, New Zealand

Join Date: May 2011
Posts: 836

Quote:
 Originally Posted by thermophile R can certainly handle it just be aware that the file will be huge.
If you output a PNG file rather than an SVG or PDF file, then the output file will not be huge. It just won't have as much information in it, and won't be infinitely scalable.