SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Cufflinks memory usage oscarluoinau Bioinformatics 9 12-10-2012 06:55 AM
segment_juncs memory usage while running Tophat genec Bioinformatics 1 11-22-2011 07:09 AM
bwa mt branch extreme memory usage? Kotoro Bioinformatics 4 11-20-2011 12:17 AM
Memory usage Seta General 2 04-05-2011 10:44 AM
Memory Usage in Newbler 2.3 smg283 454 Pyrosequencing 5 11-09-2010 07:46 AM

Reply
 
Thread Tools
Old 08-03-2009, 01:34 AM   #1
DNAjunk
Member
 
Location: Western Europe

Join Date: Jun 2009
Posts: 61
Default SHRiMP Memory Usage

Hello!

By using SHRiMP (version 1.2.1) I have tried to map about 380 Mio. SOLiD SAGE reads of length 35bp in color space format onto the reference sequence.
However, the RAM memory usage was increasing to more than 5GB, and there was no indication that augmentation would stop. So, I terminated the program manually without having any result or output file.

Has anybody made the same experience?

Should I split the reads file into several smaller data files and run them separately? And what is the optimal/maximal reads one could give as input in a run when the program should not use more than, let's say 2GB of RAM?

Thanks for any help and suggestions!
DNAjunk is offline   Reply With Quote
Old 08-04-2009, 07:05 PM   #2
Torst
Senior Member
 
Location: The University of Melbourne, AUSTRALIA

Join Date: Apr 2008
Posts: 275
Default

Quote:
Originally Posted by DNAjunk View Post
By using SHRiMP (version 1.2.1) I have tried to map about 380 Mio. SOLiD SAGE reads of length 35bp in color space format onto the reference sequence. However, the RAM memory usage was increasing to more than 5GB, and there was no indication that augmentation would stop.
You need to split your reads file into chunks of 1,000,000 reads say. Run SHRIMP separately on each chunk. Then just concatenate the SHRIMP output files. The result is identical to what you would have got by feeding all the reads at once!

The reason this works is because SHRIMP indexes the reads. Give it less reads, and it needs less memory. You will need to experiment with the chunk size to suit your computer's RAM size.

We use this method even on our server with 64GB RAM.
Torst is offline   Reply With Quote
Old 08-04-2009, 09:50 PM   #3
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by Torst View Post
The reason this works is because SHRIMP indexes the reads.
If you split the reads for any aligner, this is still the case . A good question is it theoretically more optimal to index the reads or the reference given a lookup into the index is ~O(1)?
nilshomer is offline   Reply With Quote
Old 08-04-2009, 10:25 PM   #4
Torst
Senior Member
 
Location: The University of Melbourne, AUSTRALIA

Join Date: Apr 2008
Posts: 275
Default

Quote:
Originally Posted by nilshomer View Post
If you split the reads for any aligner, this is still the case
This may be true for BFAST (your software?) and SHRIMP, but some short read aligners only index the reference. I think MAQ still does this? In those cases there is no memory occupied by a read index - the memory is only proportional to the reference index.
Torst is offline   Reply With Quote
Old 08-05-2009, 07:58 AM   #5
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by Torst View Post
This may be true for BFAST (your software?) and SHRIMP, but some short read aligners only index the reference. I think MAQ still does this? In those cases there is no memory occupied by a read index - the memory is only proportional to the reference index.
If you index 6.4 billion reference positions, it does take up a non-trivial amount of memory (i.e. BFAST). On the other hand, indexing the reads, like you say, is proportional to the number of reads (see MAQ and SHRiMP). That is why BWA and Bowtie use a Burrows-wheeler transform to compress the reference index at the cost of speed. Nevertheless, you have to "sort" or index each read chunk, whereas a reference index is only computed once per reference. It follows that indexing a reference is better than indexing reads, assuming the lookup is O(1), which can be achieved.

I still don't understand why
Quote:
The result is identical to what you would have got by feeding all the reads at once.
is explained by
Quote:
The reason this works is because SHRIMP indexes the reads
Could you give me an example where splitting the reads into discreet chunks and then merging (or catting) them together would not give you the same answer as aligning all the reads together?
nilshomer is offline   Reply With Quote
Old 08-05-2009, 02:07 PM   #6
Torst
Senior Member
 
Location: The University of Melbourne, AUSTRALIA

Join Date: Apr 2008
Posts: 275
Default

Quote:
Could you give me an example where splitting the reads into discreet chunks and then merging (or catting) them together would not give you the same answer as aligning all the reads together?
There is no such example. My explanation to the original poster was imprecise.
Torst is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:03 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO