SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Cufflinks Runtime ksiowa Bioinformatics 10 04-27-2012 05:52 AM
GATK pipeline runtime alexbmp Bioinformatics 12 11-11-2011 07:57 PM
TopHat approximate run time & memory usage? xinchen Bioinformatics 4 05-18-2010 02:47 AM
Euler-SR de novo assembly error during runtime allenyu Bioinformatics 10 08-31-2009 01:12 PM

Reply
 
Thread Tools
Old 05-07-2011, 09:52 AM   #1
BioSlayer
Member
 
Location: Wellington

Join Date: Feb 2010
Posts: 26
Default Approximate botwie runtime

this maybe a general question, from reading the article at http://genomebiology.com/2009/10/3/R25, I kind of feel that the instance of bowtie that I am running right now is behaving differently than mentioned in the article in terms of the time print... I maybe doing something wrong or something

I have SRA paired-end data downloaded from http://www.ncbi.nlm.nih.gov/sra/SRX026384?report=fullthat I am mapping against the human reference genome that is provided by bowtie... the file is roughly about 19 million reads and is about 6 GB in size. The article says it should be possible to use a normal desktop computer to be able to carry out the task in a very short time, they mentioned a matter of minutes without the indexing step (building BWT). But later on they mentioned that a server computer can take upto 21 hours building the index. My laptop is Ubuntu 32 bits, 2 GB ram, 4 GB swap, dual core and I have been running a multi-threaded bowtie instance for the past 3 days, does this sound normal ? How long did it estimably for you my colleagues when you ran bowtie ??

Here is how my query looks like :

$ bowtie hg19 -q /PATH/SRR065070.fastq -S align.map --offrate 20 -p 2



the 'hg19' argument passed to the bowtie is just a placeholder for the reference since I am invoking bowtie from within the directory where the hg19.ebwt.zip was extracted.. It generates an alignment file align.map but it is getting populated on a very slow rate that over the past 3 days and 10 hours only 205 MB were written to it
BioSlayer is offline   Reply With Quote
Old 05-07-2011, 11:49 AM   #2
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

You say you only have 2GB of RAM. How much is specified as a minimum requirement in the manual and/or paper? Consider that hg19 is 3.2 billion bases.
nilshomer is offline   Reply With Quote
Old 05-07-2011, 05:11 PM   #3
RDW
Member
 
Location: London

Join Date: Oct 2008
Posts: 63
Default

According to the paper (http://genomebiology.com/2009/10/3/R25):

'A Bowtie index for the human genome fits in 2.2 GB on disk and has a memory footprint of as little as 1.3 GB at alignment time, allowing it to be queried on a workstation with under 2 GB of RAM.'

However, the current pre-built hg indices on their site are larger than 2.2 GB. Also, the memory footprint might be bigger for p>1; have you tried running this single threaded? Use something like the System Monitor or 'top' to figure out if your job fits in the machine's RAM - if your system is forced to use swap just to hold the index I expect the run will be desperately slow.
RDW is offline   Reply With Quote
Old 05-08-2011, 12:01 AM   #4
BioSlayer
Member
 
Location: Wellington

Join Date: Feb 2010
Posts: 26
Default

Speaking of parallel performance; the paper says that the memory image of the index is shared by threads which could increase performance on multiple cores and that there will not be a 'substantial' increase in memory consumption upon using multiple threads. So these threads they synchronize their activities (fetching reads, outputting results, switching between indices and marking jobs).

On your cue, RDW, I checked whether or not SWAP was involved, so I see that both processors are running full blast and the bowtie job occupies 1.5 GB of RAM, however, I see the swap with 1.3 GB consumption but it is not clear to me whether this is coming from bowtie, I haven't tried running a single threaded job, my decision to run a dual thread was the notion that parallelism was gonna cut short the time...

Nilshomer, in the paper they ran bowtie on a server and on a PC and benchmarked the performance, the PC had 2 GB of RAM and this is why I was optimistic...
BioSlayer is offline   Reply With Quote
Old 05-09-2011, 04:51 AM   #5
volks
Member
 
Location: hd.de

Join Date: Jun 2010
Posts: 81
Default

the human genome takes about 3.3GB of memory, so the swapping is caused by bowtie. this is your major bottleneck.
multithreading does not increase memory requirement.
volks is offline   Reply With Quote
Old 05-09-2011, 05:46 AM   #6
biznatch
Senior Member
 
Location: Canada

Join Date: Nov 2010
Posts: 124
Default

When I run Bowtie on a Core 2 Duo 2.0 Ghz using both cores (-p 2) with 3.3 GB available RAM under Ubuntu 32 bit it would take a few hours to align 19 million reads to hg19, depending on what options I'm using. I often run it overnight so I don't know exactly how long, but definitely less than 8 hours.

You should probably add more RAM (put in 4GB to get the max ~3.3 available on a 32 bit system).
biznatch is offline   Reply With Quote
Old 05-09-2011, 05:49 AM   #7
NicoBxl
not just another member
 
Location: Belgium

Join Date: Aug 2010
Posts: 264
Default

use -t to know the run time
NicoBxl is offline   Reply With Quote
Reply

Tags
bowtie, time

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:23 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO