Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • gmap_build error

    Hi guys,
    I am in process of configuring GSNAP on the cluster of my university however I am repeatedly encountering an error in one step and I cant seem to solve it. I have installed the software on the cluster and am in the process of building the mm9 genome. I have followed the steps so far as per the documentation and gmap_build works fine until it reaches the step where it says on my console:

    Building suffix array
    SACA_K called with n = 2725765482, K = 5, level 0


    It is after this step that the process crashes and gives me an error message:

    /home/satyajit/GSNAP/bin/gmapindex -d mm9 -F /home/satyajit/GSNAP/gmap-2014-07-04/gmapdb/mm9 -D /home/satyajit/GSNAP/gmap-2014-07-04/gmapdb/mm9 -S failed with return code 131 at /home/satyajit/GSNAP/bin/gmap_build line 360.

    I have tried to run this installation several times now and on different machines as well and every time it crashes during this particular phase of configuration. The maximum memory I have used to configure this is a 64GB RAM with 16 cores of processing power on the cluster. Is this step the most memory intensive? Does it require even more memory than the one I have used? Or am I simply doing something fundamentally wrong? I am quite frankly at a loss about how to go forward tackling this issue and any help you could provide me with would be greatly appreciated.
    I plan on using GSNAP for SNP tolerant alignment in my datasets.
    The command I used for gmap_build is:

    gmap_build -d mm9 -g -k 15 chr1.fa.gz chr1_random.fa.gz chr2.fa.gz chr3_random.fa.gz chr3.fa.gz chr4_random.fa.gz chr4.fa.gz chr5_random.fa.gz chr5.fa.gz chr6.fa.gz chr7_random.fa.gz chr7.fa.gz chr8_random.fa.gz chr8.fa.gz chr9_random.fa.gz chr9.fa.gz chr10.fa.gz chr11.fa.gz chr12.fa.gz chr13_random.fa.gz chr13.fa.gz chr14.fa.gz chr15.fa.gz chr16_random.fa.gz chr16.fa.gz chr17_random.fa.gz chr17.fa.gz chr18.fa.gz chr19.fa.gz chrX_random.fa.gz chrX.fa.gz chrY_random.fa.gz chrY.fa.gz chrM.fa.gz chrUn_random.fa.gz
    Last edited by Satya; 07-15-2014, 11:42 AM.

  • #2
    It appears that the build step requires sequence files to be uncompressed (https://github.com/julian-gehring/GMAP-GSNAP, look for section 4c). Have you tried using uncompressed sequence files?

    Comment


    • #3
      Isn't that the requirement for gmap_setup though? I thought gmap_build would accept gzipped files after using the -g option? It didn't work with uncompressed fastq files. I tried it out just in case right now.
      Last edited by Satya; 07-15-2014, 11:53 AM.

      Comment


      • #4
        You are right there is a "-g" option mentioned for gmap_build.

        Out of curiosity can you try the build with a single uncompressed chromosome fasta file to see if it goes through?

        Comment


        • #5
          Excellent suggestion! It worked when I used just a single uncompressed fasta file. Does this mean this I need to simply allocate more memory for the entire process?

          Comment


          • #6
            If you were passing that job along to a scheduler with a specific memory allocation then it would not hurt to increase that request.

            My hunch is that perhaps one of the chromosome files (*random*/ *un* come to mind as a culprit) may be causing the original error. You may have already tried this but I would say add a couple more chromosomes and see if that works and after that point everything except the random/un would be the next logical step to try.

            Comment


            • #7
              Dear all,

              I resolved this by running the gmap_build on a larger machine. I also got this error and chased down many paths, in the end it was as simple as needing more memory.

              In my case, I was building hg19 to work with Pacific Biosciences ToFU command line pipeline. https://github.com/PacificBiosciences/cDNA_primer/wiki. I installed the latest gmap on an ubuntu instance, started through use of MIT's starcluster software http://star.mit.edu/cluster/about.html. Resolving the proper perl version (starcluster AMI instances are notoriously out of date, so the default perl version is too far gone, so I used the smrtanalysis version to get it correct.

              So success involved first setting two environmental variables:

              export PERL5LIB=/mnt/smrtanalysis/current/miscdeps/basesys/usr/lib64/perl5:/mnt/smrtanalysis/current/miscdeps/basesys/usr/lib64/perl5/5.8.8
              export PATH=/usr/local/bin:/usr/bin:/bin:$PATH



              After setting the path correctly got me to the point where I had the same error reported above:

              Building suffix array
              SACA_K called with n = 3137161265, K = 5, level 0
              Killed
              /usr/local/bin/gmapindex -d hg19 -F "/mnt/hg19/hg19" -D "/mnt/hg19/hg19" -S failed with return code 35072 at /mnt/\
              smrtanalysis/current/analysis/bin/gmap_build line 376.


              However, Genomax provided me the hint I needed. Rather than thinking I had anything else wrong, it was clearly worth trying a bigger box. Success came by running the software on a larger ubuntu instance - r3.8xlarge (240GB) machine. Which I instantiated and added to my configuration -- I logged into the new node and executed the command:

              gmap_build -s none -k 15 -d hg19 -D /mnt/hg19 /mnt/hg19/hg19.fa

              Successfully
              Last edited by adeslat; 02-15-2016, 06:07 AM.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              10 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              9 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              49 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              67 views
              0 likes
              Last Post seqadmin  
              Working...
              X