SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
gigaBayes output format ... inconsistent with manual? jnfass Bioinformatics 3 12-28-2013 09:10 PM
GigaBuild/GigaBayes problem JueFish Bioinformatics 4 03-31-2011 10:57 AM
gigabayes no output shaohua.fan Bioinformatics 7 10-16-2010 09:59 AM
Mosaik, gigaBayes and paired-end output Gianza Bioinformatics 0 08-24-2010 08:06 AM
How to detect homozygous SNPs with GigaBayes fikys Bioinformatics 2 12-11-2009 06:00 PM

Reply
 
Thread Tools
Old 11-02-2009, 11:09 AM   #1
sdriscoll
I like code
 
Location: San Diego, CA, USA

Join Date: Sep 2009
Posts: 438
Default gigaBayes (MarthLab) is running very slowly

According to the documentation gigaBayes should run fairly quickly (18,000 30 bp reads per second) however on my system I'm seeing much slower performance (at least I'm interpreting it to be slow). With the --debug switch enabled I can watch the output and sometimes it spends 30 seconds or even several minutes at a single position resulting in total run time of more than 30 hours.

I'm using illumina sequence data, single end reads, mouse organism and I've run the data through MosaikAligner, MosaikSort, and finally MosaikAssembler to produce a GIG file for gigaBayes. We're only interested in investigating chromosome 7 so I only aligned my source data to that chromosome (truncated reference file). Alignment and the other two programs tend to complete in about 1 hour total but then gigaBayes runs for up to 2 days straight before completing and I've caught it using up to 15 GB of RAM. My full gigaBayes command looks like this:

gigaBayes --gig input.gig --gff output.gff --anchor --ploidy diploid --PSL 0.9 --debug

My system specs:
AMD Phenom 9950 Quad-core
16 GB RAM
Ubuntu 9.04 64 bit (2.6.28-15 kernel)

does anyone have any idea what's going on?
sdriscoll is offline   Reply With Quote
Old 11-03-2009, 03:49 PM   #2
sdriscoll
I like code
 
Location: San Diego, CA, USA

Join Date: Sep 2009
Posts: 438
Default

Last night I tried running the above with a reduced region: 10,000,000 to 27,000,000. I ran that on 3 data files and each took about 1 hour to 1.5 hours to complete.

Today I tried something different. My revised command line looks like this:

gigaBayes --gig input.gig --gff output.gff --anchor --indel --ploidy diploid --sample "single" --debug

Those same lanes took around 3 minutes to complete each. So it must have been the --sample option. By default gigaBayes uses "multiple" meaning that your same came from multiple specimens (?). I'm not sure how that works but it doesn't apply to my data which all comes from one source. That still doesn't answer the performance question though since that's a default value and by default it should be running very quickly. For my purposes, I'm happy now. But I don't understand why there was such a massive speed difference. Again if anyone has any insight it'd be cool to hear it. Thanks.
sdriscoll is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:52 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO