Seqanswers Leaderboard Ad

**Torst** · 08-04-2009, 07:05 PM

Originally posted by DNAjunk View Post

By using SHRiMP (version 1.2.1) I have tried to map about 380 Mio. SOLiD SAGE reads of length 35bp in color space format onto the reference sequence. However, the RAM memory usage was increasing to more than 5GB, and there was no indication that augmentation would stop.

You need to split your reads file into chunks of 1,000,000 reads say. Run SHRIMP separately on each chunk. Then just concatenate the SHRIMP output files. The result is identical to what you would have got by feeding all the reads at once!

The reason this works is because SHRIMP indexes the reads. Give it less reads, and it needs less memory. You will need to experiment with the chunk size to suit your computer's RAM size.

We use this method even on our server with 64GB RAM.

**nilshomer** · 08-04-2009, 09:50 PM

Originally posted by Torst View Post

The reason this works is because SHRIMP indexes the reads.

If you split the reads for any aligner, this is still the case

. A good question is it theoretically more optimal to index the reads or the reference given a lookup into the index is ~O(1)?

**Torst** · 08-04-2009, 10:25 PM

Originally posted by nilshomer View Post

If you split the reads for any aligner, this is still the case

This may be true for BFAST (your software?) and SHRIMP, but some short read aligners only index the reference. I think MAQ still does this? In those cases there is no memory occupied by a read index - the memory is only proportional to the reference index.

**nilshomer** · 08-05-2009, 07:58 AM

Originally posted by Torst View Post

This may be true for BFAST (your software?) and SHRIMP, but some short read aligners only index the reference. I think MAQ still does this? In those cases there is no memory occupied by a read index - the memory is only proportional to the reference index.

If you index 6.4 billion reference positions, it does take up a non-trivial amount of memory (i.e. BFAST). On the other hand, indexing the reads, like you say, is proportional to the number of reads (see MAQ and SHRiMP). That is why BWA and Bowtie use a Burrows-wheeler transform to compress the reference index at the cost of speed. Nevertheless, you have to "sort" or index each read chunk, whereas a reference index is only computed once per reference. It follows that indexing a reference is better than indexing reads, assuming the lookup is O(1), which can be achieved.

I still don't understand why

The result is identical to what you would have got by feeding all the reads at once.

is explained by

The reason this works is because SHRIMP indexes the reads

Could you give me an example where splitting the reads into discreet chunks and then merging (or catting) them together would not give you the same answer as aligning all the reads together?

**Torst** · 08-05-2009, 02:07 PM

Could you give me an example where splitting the reads into discreet chunks and then merging (or catting) them together would not give you the same answer as aligning all the reads together?

There is no such example. My explanation to the original poster was imprecise.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 29 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

SHRiMP Memory Usage

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News