SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
error with sam output ->Parse error at line xxxxx: missing colon in auxiliary data manore Bioinformatics 11 11-25-2013 01:50 PM
Tophat Error: Error: segment-based junction search failed with err =-6 sjnewhouse RNA Sequencing 8 03-19-2013 04:14 AM
Tophat 2 error: Parse error at line 297359: missing colon in auxiliary data magbju RNA Sequencing 2 01-15-2013 03:57 AM
tophat2 segment_juncs error: Error: segment-based junction search failed with err =-6 [email protected] Bioinformatics 1 04-16-2012 06:37 AM

Reply
 
Thread Tools
Old 07-15-2014, 10:43 AM   #1
Satya
Junior Member
 
Location: USA

Join Date: Jul 2014
Posts: 3
Default gmap_build error

Hi guys,
I am in process of configuring GSNAP on the cluster of my university however I am repeatedly encountering an error in one step and I cant seem to solve it. I have installed the software on the cluster and am in the process of building the mm9 genome. I have followed the steps so far as per the documentation and gmap_build works fine until it reaches the step where it says on my console:

Building suffix array
SACA_K called with n = 2725765482, K = 5, level 0


It is after this step that the process crashes and gives me an error message:

/home/satyajit/GSNAP/bin/gmapindex -d mm9 -F /home/satyajit/GSNAP/gmap-2014-07-04/gmapdb/mm9 -D /home/satyajit/GSNAP/gmap-2014-07-04/gmapdb/mm9 -S failed with return code 131 at /home/satyajit/GSNAP/bin/gmap_build line 360.

I have tried to run this installation several times now and on different machines as well and every time it crashes during this particular phase of configuration. The maximum memory I have used to configure this is a 64GB RAM with 16 cores of processing power on the cluster. Is this step the most memory intensive? Does it require even more memory than the one I have used? Or am I simply doing something fundamentally wrong? I am quite frankly at a loss about how to go forward tackling this issue and any help you could provide me with would be greatly appreciated.
I plan on using GSNAP for SNP tolerant alignment in my datasets.
The command I used for gmap_build is:

gmap_build -d mm9 -g -k 15 chr1.fa.gz chr1_random.fa.gz chr2.fa.gz chr3_random.fa.gz chr3.fa.gz chr4_random.fa.gz chr4.fa.gz chr5_random.fa.gz chr5.fa.gz chr6.fa.gz chr7_random.fa.gz chr7.fa.gz chr8_random.fa.gz chr8.fa.gz chr9_random.fa.gz chr9.fa.gz chr10.fa.gz chr11.fa.gz chr12.fa.gz chr13_random.fa.gz chr13.fa.gz chr14.fa.gz chr15.fa.gz chr16_random.fa.gz chr16.fa.gz chr17_random.fa.gz chr17.fa.gz chr18.fa.gz chr19.fa.gz chrX_random.fa.gz chrX.fa.gz chrY_random.fa.gz chrY.fa.gz chrM.fa.gz chrUn_random.fa.gz

Last edited by Satya; 07-15-2014 at 11:42 AM.
Satya is offline   Reply With Quote
Old 07-15-2014, 11:11 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,780
Default

It appears that the build step requires sequence files to be uncompressed (https://github.com/julian-gehring/GMAP-GSNAP, look for section 4c). Have you tried using uncompressed sequence files?
GenoMax is offline   Reply With Quote
Old 07-15-2014, 11:41 AM   #3
Satya
Junior Member
 
Location: USA

Join Date: Jul 2014
Posts: 3
Default

Isn't that the requirement for gmap_setup though? I thought gmap_build would accept gzipped files after using the -g option? It didn't work with uncompressed fastq files. I tried it out just in case right now.

Last edited by Satya; 07-15-2014 at 11:53 AM.
Satya is offline   Reply With Quote
Old 07-15-2014, 12:37 PM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,780
Default

You are right there is a "-g" option mentioned for gmap_build.

Out of curiosity can you try the build with a single uncompressed chromosome fasta file to see if it goes through?
GenoMax is offline   Reply With Quote
Old 07-15-2014, 12:49 PM   #5
Satya
Junior Member
 
Location: USA

Join Date: Jul 2014
Posts: 3
Default

Excellent suggestion! It worked when I used just a single uncompressed fasta file. Does this mean this I need to simply allocate more memory for the entire process?
Satya is offline   Reply With Quote
Old 07-15-2014, 02:14 PM   #6
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,780
Default

If you were passing that job along to a scheduler with a specific memory allocation then it would not hurt to increase that request.

My hunch is that perhaps one of the chromosome files (*random*/ *un* come to mind as a culprit) may be causing the original error. You may have already tried this but I would say add a couple more chromosomes and see if that works and after that point everything except the random/un would be the next logical step to try.
GenoMax is offline   Reply With Quote
Old 02-15-2016, 04:56 AM   #7
adeslat
Junior Member
 
Location: washington area

Join Date: Mar 2011
Posts: 7
Default

Dear all,

I resolved this by running the gmap_build on a larger machine. I also got this error and chased down many paths, in the end it was as simple as needing more memory.

In my case, I was building hg19 to work with Pacific Biosciences ToFU command line pipeline. https://github.com/PacificBiosciences/cDNA_primer/wiki. I installed the latest gmap on an ubuntu instance, started through use of MIT's starcluster software http://star.mit.edu/cluster/about.html. Resolving the proper perl version (starcluster AMI instances are notoriously out of date, so the default perl version is too far gone, so I used the smrtanalysis version to get it correct.

So success involved first setting two environmental variables:

export PERL5LIB=/mnt/smrtanalysis/current/miscdeps/basesys/usr/lib64/perl5:/mnt/smrtanalysis/current/miscdeps/basesys/usr/lib64/perl5/5.8.8
export PATH=/usr/local/bin:/usr/bin:/bin:$PATH



After setting the path correctly got me to the point where I had the same error reported above:

Building suffix array
SACA_K called with n = 3137161265, K = 5, level 0
Killed
/usr/local/bin/gmapindex -d hg19 -F "/mnt/hg19/hg19" -D "/mnt/hg19/hg19" -S failed with return code 35072 at /mnt/\
smrtanalysis/current/analysis/bin/gmap_build line 376.


However, Genomax provided me the hint I needed. Rather than thinking I had anything else wrong, it was clearly worth trying a bigger box. Success came by running the software on a larger ubuntu instance - r3.8xlarge (240GB) machine. Which I instantiated and added to my configuration -- I logged into the new node and executed the command:

gmap_build -s none -k 15 -d hg19 -D /mnt/hg19 /mnt/hg19/hg19.fa

Successfully

Last edited by adeslat; 02-15-2016 at 05:07 AM.
adeslat is offline   Reply With Quote
Reply

Tags
building errors, error, gmap, gsnap, mm9

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:10 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO