Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • gigaBayes (MarthLab) is running very slowly

    According to the documentation gigaBayes should run fairly quickly (18,000 30 bp reads per second) however on my system I'm seeing much slower performance (at least I'm interpreting it to be slow). With the --debug switch enabled I can watch the output and sometimes it spends 30 seconds or even several minutes at a single position resulting in total run time of more than 30 hours.

    I'm using illumina sequence data, single end reads, mouse organism and I've run the data through MosaikAligner, MosaikSort, and finally MosaikAssembler to produce a GIG file for gigaBayes. We're only interested in investigating chromosome 7 so I only aligned my source data to that chromosome (truncated reference file). Alignment and the other two programs tend to complete in about 1 hour total but then gigaBayes runs for up to 2 days straight before completing and I've caught it using up to 15 GB of RAM. My full gigaBayes command looks like this:

    gigaBayes --gig input.gig --gff output.gff --anchor --ploidy diploid --PSL 0.9 --debug

    My system specs:
    AMD Phenom 9950 Quad-core
    16 GB RAM
    Ubuntu 9.04 64 bit (2.6.28-15 kernel)

    does anyone have any idea what's going on?
    /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
    Salk Institute for Biological Studies, La Jolla, CA, USA */

  • #2
    Last night I tried running the above with a reduced region: 10,000,000 to 27,000,000. I ran that on 3 data files and each took about 1 hour to 1.5 hours to complete.

    Today I tried something different. My revised command line looks like this:

    gigaBayes --gig input.gig --gff output.gff --anchor --indel --ploidy diploid --sample "single" --debug

    Those same lanes took around 3 minutes to complete each. So it must have been the --sample option. By default gigaBayes uses "multiple" meaning that your same came from multiple specimens (?). I'm not sure how that works but it doesn't apply to my data which all comes from one source. That still doesn't answer the performance question though since that's a default value and by default it should be running very quickly. For my purposes, I'm happy now. But I don't understand why there was such a massive speed difference. Again if anyone has any insight it'd be cool to hear it. Thanks.
    /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
    Salk Institute for Biological Studies, La Jolla, CA, USA */

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM
    • seqadmin
      Strategies for Sequencing Challenging Samples
      by seqadmin


      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
      03-22-2024, 06:39 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 04-11-2024, 12:08 PM
    0 responses
    22 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 10:19 PM
    0 responses
    24 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 09:21 AM
    0 responses
    20 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-04-2024, 09:00 AM
    0 responses
    52 views
    0 likes
    Last Post seqadmin  
    Working...
    X