gigaBayes (MarthLab) is running very slowly

sdriscoll

I like code

Join Date: Sep 2009

Posts: 436
- Share
- Tweet
#1

gigaBayes (MarthLab) is running very slowly

11-02-2009, 12:09 PM

According to the documentation gigaBayes should run fairly quickly (18,000 30 bp reads per second) however on my system I'm seeing much slower performance (at least I'm interpreting it to be slow). With the --debug switch enabled I can watch the output and sometimes it spends 30 seconds or even several minutes at a single position resulting in total run time of more than 30 hours.

I'm using illumina sequence data, single end reads, mouse organism and I've run the data through MosaikAligner, MosaikSort, and finally MosaikAssembler to produce a GIG file for gigaBayes. We're only interested in investigating chromosome 7 so I only aligned my source data to that chromosome (truncated reference file). Alignment and the other two programs tend to complete in about 1 hour total but then gigaBayes runs for up to 2 days straight before completing and I've caught it using up to 15 GB of RAM. My full gigaBayes command looks like this:

gigaBayes --gig input.gig --gff output.gff --anchor --ploidy diploid --PSL 0.9 --debug

My system specs:
AMD Phenom 9950 Quad-core
16 GB RAM
Ubuntu 9.04 64 bit (2.6.28-15 kernel)

does anyone have any idea what's going on?

/* Shawn Driscoll, Gene Expression Laboratory, Pfaff
Salk Institute for Biological Studies, La Jolla, CA, USA */
Tags: None
sdriscoll

I like code

Join Date: Sep 2009

Posts: 436
- Share
- Tweet
#2

11-03-2009, 04:49 PM

Last night I tried running the above with a reduced region: 10,000,000 to 27,000,000. I ran that on 3 data files and each took about 1 hour to 1.5 hours to complete.

Today I tried something different. My revised command line looks like this:

gigaBayes --gig input.gig --gff output.gff --anchor --indel --ploidy diploid --sample "single" --debug

Those same lanes took around 3 minutes to complete each. So it must have been the --sample option. By default gigaBayes uses "multiple" meaning that your same came from multiple specimens (?). I'm not sure how that works but it doesn't apply to my data which all comes from one source. That still doesn't answer the performance question though since that's a default value and by default it should be running very quickly. For my purposes, I'm happy now. But I don't understand why there was such a massive speed difference. Again if anyone has any insight it'd be cool to hear it. Thanks.

/* Shawn Driscoll, Gene Expression Laboratory, Pfaff
Salk Institute for Biological Studies, La Jolla, CA, USA */
Comment

Previous template Next

Recent Advances in Sequencing Analysis Tools

by seqadmin

The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
- Channel: Articles
05-06-2024, 07:48 AM
Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM

Topics	Statistics	Last Post
The Role of Spliceosomes in RNA Splicing and Genome Evolution by seqadmin Started by seqadmin, Today, 07:03 AM	0 responses 9 views 0 likes	Last Post by seqadmin Today, 07:03 AM
A Closer Look at the Enigmatic Genomes of Oikopleura dioica by seqadmin Started by seqadmin, 05-10-2024, 06:35 AM	0 responses 27 views 0 likes	Last Post by seqadmin 05-10-2024, 06:35 AM
Advanced Epigenome Editing Platform Explores Gene Regulation Mechanisms by seqadmin Started by seqadmin, 05-09-2024, 02:46 PM	0 responses 31 views 0 likes	Last Post by seqadmin 05-09-2024, 02:46 PM
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, 05-07-2024, 06:57 AM	0 responses 26 views 0 likes	Last Post by seqadmin 05-07-2024, 06:57 AM

Seqanswers Leaderboard Ad

Announcement

gigaBayes (MarthLab) is running very slowly

Comment

Latest Articles

ad_right_rmr

News