![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
error with sam output ->Parse error at line xxxxx: missing colon in auxiliary data | manore | Bioinformatics | 11 | 11-25-2013 02:50 PM |
MEGAN from the command line | oTrout | Bioinformatics | 4 | 10-30-2012 05:31 AM |
Want to use extract_genomic_dna in command line | louis7781x | Bioinformatics | 2 | 12-04-2011 06:51 AM |
SAMtools command line ??? | Pawan Noel | Bioinformatics | 6 | 11-16-2010 11:42 AM |
SIFT on the command line | lamasmi | Bioinformatics | 2 | 08-17-2010 10:32 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Milwaukee Join Date: Dec 2011
Posts: 72
|
![]()
I am analyzing large datasets in R. To analyze data, my current practice is to import the entire dataset into the R workspace using the read.table() function. Rather than importing the entire dataset, however, I was wondering if it is possible to import, analyze and export each line of data individually so that the analysis would take up less computer memory.
Can this be done? And if so, how? |
![]() |
![]() |
![]() |
#2 | |
Senior Member
Location: Cambridge, UK Join Date: May 2010
Posts: 311
|
![]() Quote:
Code:
totlines<- 10000000 ## Number of lines in your big input. Get it from wc -l skip<- 0 chunkLines= 10000 ## No. of lines to read in one go. Set to 1 to really read one line at a time. while (skip < totlines){ df<- read.table(myinput, skip= skip, nrows= chunkLines, stringsAsFactors= FALSE) skip<- skip + chunkLines [...do something with df...] } A better alternative might be to use packages designed for dealing with data larger than memory, ff (http://cran.r-project.org/web/packages/ff/index.html) is one of them. Hope this helps! Dario |
|
![]() |
![]() |
![]() |
#3 |
Member
Location: Milwaukee Join Date: Dec 2011
Posts: 72
|
![]()
Thanks Dario, much appreciated.
|
![]() |
![]() |
![]() |
#4 |
Member
Location: Maryland Join Date: Apr 2010
Posts: 31
|
![]()
Check out the readLines function in R.
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|