Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SHRiMP Memory Usage

    Hello!

    By using SHRiMP (version 1.2.1) I have tried to map about 380 Mio. SOLiD SAGE reads of length 35bp in color space format onto the reference sequence.
    However, the RAM memory usage was increasing to more than 5GB, and there was no indication that augmentation would stop. So, I terminated the program manually without having any result or output file.

    Has anybody made the same experience?

    Should I split the reads file into several smaller data files and run them separately? And what is the optimal/maximal reads one could give as input in a run when the program should not use more than, let's say 2GB of RAM?

    Thanks for any help and suggestions!

  • #2
    Originally posted by DNAjunk View Post
    By using SHRiMP (version 1.2.1) I have tried to map about 380 Mio. SOLiD SAGE reads of length 35bp in color space format onto the reference sequence. However, the RAM memory usage was increasing to more than 5GB, and there was no indication that augmentation would stop.
    You need to split your reads file into chunks of 1,000,000 reads say. Run SHRIMP separately on each chunk. Then just concatenate the SHRIMP output files. The result is identical to what you would have got by feeding all the reads at once!

    The reason this works is because SHRIMP indexes the reads. Give it less reads, and it needs less memory. You will need to experiment with the chunk size to suit your computer's RAM size.

    We use this method even on our server with 64GB RAM.

    Comment


    • #3
      Originally posted by Torst View Post
      The reason this works is because SHRIMP indexes the reads.
      If you split the reads for any aligner, this is still the case . A good question is it theoretically more optimal to index the reads or the reference given a lookup into the index is ~O(1)?

      Comment


      • #4
        Originally posted by nilshomer View Post
        If you split the reads for any aligner, this is still the case
        This may be true for BFAST (your software?) and SHRIMP, but some short read aligners only index the reference. I think MAQ still does this? In those cases there is no memory occupied by a read index - the memory is only proportional to the reference index.

        Comment


        • #5
          Originally posted by Torst View Post
          This may be true for BFAST (your software?) and SHRIMP, but some short read aligners only index the reference. I think MAQ still does this? In those cases there is no memory occupied by a read index - the memory is only proportional to the reference index.
          If you index 6.4 billion reference positions, it does take up a non-trivial amount of memory (i.e. BFAST). On the other hand, indexing the reads, like you say, is proportional to the number of reads (see MAQ and SHRiMP). That is why BWA and Bowtie use a Burrows-wheeler transform to compress the reference index at the cost of speed. Nevertheless, you have to "sort" or index each read chunk, whereas a reference index is only computed once per reference. It follows that indexing a reference is better than indexing reads, assuming the lookup is O(1), which can be achieved.

          I still don't understand why
          The result is identical to what you would have got by feeding all the reads at once.
          is explained by
          The reason this works is because SHRIMP indexes the reads
          Could you give me an example where splitting the reads into discreet chunks and then merging (or catting) them together would not give you the same answer as aligning all the reads together?

          Comment


          • #6
            Could you give me an example where splitting the reads into discreet chunks and then merging (or catting) them together would not give you the same answer as aligning all the reads together?
            There is no such example. My explanation to the original poster was imprecise.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 06:37 PM
            0 responses
            10 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 06:07 PM
            0 responses
            9 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            49 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            67 views
            0 likes
            Last Post seqadmin  
            Working...
            X