SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
BFAST index error kursuni Bioinformatics 1 09-27-2011 01:39 AM
Index for bfast with Arabidopsis thurisaz RNA Sequencing 0 09-07-2011 06:59 AM
bfast index kalinka23 Bioinformatics 1 08-31-2011 07:17 AM
VCF index creation doesn't finish Yilong Li Bioinformatics 0 04-05-2011 07:01 AM
BFast index creation & other SOLiD difficulties keebs42 Bioinformatics 9 02-09-2010 09:13 PM

Reply
 
Thread Tools
Old 04-02-2010, 05:39 AM   #1
guillaum
Junior Member
 
Location: France

Join Date: Apr 2010
Posts: 3
Default Bfast index creation

Hi,

I am trying Bfast for the first time and have some trouble with the creation of the index for the whole human genome.
I want to create the 10 indexes as shown in the Bfast paper and experience very long running times (10 to 20 hours) for each one of them. The memory consumption goes up to 33 Gb, which arise difficulties and may cause this long running time , as maybe the system is swapping memory a lot. (I have a 64 Gb memory system, but I am not the only user)

What is the typical running time for index creation on the human genome?

I understand that if memory is the issue, I might try the "-d" parameter to split the index in parts, which leads to my second question :
Does the index splitting has any performance impact on the next step of the algorithm, finding candidate alignment locations, and to what extent ?
( I suppose it has an impact , otherwise index splitting would be done by default, wouldn't it ?)


Thanks !
guillaum is offline   Reply With Quote
Old 04-02-2010, 09:03 AM   #2
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by guillaum View Post
Hi,

I am trying Bfast for the first time and have some trouble with the creation of the index for the whole human genome.
I want to create the 10 indexes as shown in the Bfast paper and experience very long running times (10 to 20 hours) for each one of them. The memory consumption goes up to 33 Gb, which arise difficulties and may cause this long running time , as maybe the system is swapping memory a lot. (I have a 64 Gb memory system, but I am not the only user)

What is the typical running time for index creation on the human genome?

I understand that if memory is the issue, I might try the "-d" parameter to split the index in parts, which leads to my second question :
Does the index splitting has any performance impact on the next step of the algorithm, finding candidate alignment locations, and to what extent ?
( I suppose it has an impact , otherwise index splitting would be done by default, wouldn't it ?)


Thanks !
For the human genome, index creation can easily run on an 8-core machine in 5-6 hours (remember to use multi-threading). Also, I regularly build such indexes on 32GB RAM machines. Could you give the command you are using to create the indexes?

Index splitting (beyond "-d 1") has significant performance impact as this requires expensive merging of the each of the split indexes. For "-d 1" where there indexes are split into four pieces, the performance decrease (of the "match" step) is not too bad. If you have 24G or greater of RAM, you should not need to split the indexes.
nilshomer is offline   Reply With Quote
Old 04-02-2010, 09:24 AM   #3
guillaum
Junior Member
 
Location: France

Join Date: Apr 2010
Posts: 3
Default

Thanks for this information. So I should run it with "-d 1" if I have less than 24 Gb available.

The command I used was

Quote:
./bin/bfast index -f all.fasta -A 0 -m 1111011010001000110101100101100110100111 -t -w 14 -i 9
guillaum is offline   Reply With Quote
Old 04-02-2010, 09:40 AM   #4
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by guillaum View Post
Thanks for this information. So I should run it with "-d 1" if I have less than 24 Gb available.

The command I used was
With "-d 1" it runs comfortably in 8GB of RAM. How big is all.fasta (3.2x10^9)?
nilshomer is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:47 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO