Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Approximate botwie runtime

    this maybe a general question, from reading the article at http://genomebiology.com/2009/10/3/R25, I kind of feel that the instance of bowtie that I am running right now is behaving differently than mentioned in the article in terms of the time print... I maybe doing something wrong or something

    I have SRA paired-end data downloaded from http://www.ncbi.nlm.nih.gov/sra/SRX026384?report=fullthat I am mapping against the human reference genome that is provided by bowtie... the file is roughly about 19 million reads and is about 6 GB in size. The article says it should be possible to use a normal desktop computer to be able to carry out the task in a very short time, they mentioned a matter of minutes without the indexing step (building BWT). But later on they mentioned that a server computer can take upto 21 hours building the index. My laptop is Ubuntu 32 bits, 2 GB ram, 4 GB swap, dual core and I have been running a multi-threaded bowtie instance for the past 3 days, does this sound normal ? How long did it estimably for you my colleagues when you ran bowtie ??

    Here is how my query looks like :

    $ bowtie hg19 -q /PATH/SRR065070.fastq -S align.map --offrate 20 -p 2



    the 'hg19' argument passed to the bowtie is just a placeholder for the reference since I am invoking bowtie from within the directory where the hg19.ebwt.zip was extracted.. It generates an alignment file align.map but it is getting populated on a very slow rate that over the past 3 days and 10 hours only 205 MB were written to it

  • #2
    You say you only have 2GB of RAM. How much is specified as a minimum requirement in the manual and/or paper? Consider that hg19 is 3.2 billion bases.

    Comment


    • #3
      According to the paper (http://genomebiology.com/2009/10/3/R25):

      'A Bowtie index for the human genome fits in 2.2 GB on disk and has a memory footprint of as little as 1.3 GB at alignment time, allowing it to be queried on a workstation with under 2 GB of RAM.'

      However, the current pre-built hg indices on their site are larger than 2.2 GB. Also, the memory footprint might be bigger for p>1; have you tried running this single threaded? Use something like the System Monitor or 'top' to figure out if your job fits in the machine's RAM - if your system is forced to use swap just to hold the index I expect the run will be desperately slow.

      Comment


      • #4
        Speaking of parallel performance; the paper says that the memory image of the index is shared by threads which could increase performance on multiple cores and that there will not be a 'substantial' increase in memory consumption upon using multiple threads. So these threads they synchronize their activities (fetching reads, outputting results, switching between indices and marking jobs).

        On your cue, RDW, I checked whether or not SWAP was involved, so I see that both processors are running full blast and the bowtie job occupies 1.5 GB of RAM, however, I see the swap with 1.3 GB consumption but it is not clear to me whether this is coming from bowtie, I haven't tried running a single threaded job, my decision to run a dual thread was the notion that parallelism was gonna cut short the time...

        Nilshomer, in the paper they ran bowtie on a server and on a PC and benchmarked the performance, the PC had 2 GB of RAM and this is why I was optimistic...

        Comment


        • #5
          the human genome takes about 3.3GB of memory, so the swapping is caused by bowtie. this is your major bottleneck.
          multithreading does not increase memory requirement.

          Comment


          • #6
            When I run Bowtie on a Core 2 Duo 2.0 Ghz using both cores (-p 2) with 3.3 GB available RAM under Ubuntu 32 bit it would take a few hours to align 19 million reads to hg19, depending on what options I'm using. I often run it overnight so I don't know exactly how long, but definitely less than 8 hours.

            You should probably add more RAM (put in 4GB to get the max ~3.3 available on a 32 bit system).

            Comment


            • #7
              use -t to know the run time

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              10 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              9 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              49 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              67 views
              0 likes
              Last Post seqadmin  
              Working...
              X