Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bfast index creation

    Hi,

    I am trying Bfast for the first time and have some trouble with the creation of the index for the whole human genome.
    I want to create the 10 indexes as shown in the Bfast paper and experience very long running times (10 to 20 hours) for each one of them. The memory consumption goes up to 33 Gb, which arise difficulties and may cause this long running time , as maybe the system is swapping memory a lot. (I have a 64 Gb memory system, but I am not the only user)

    What is the typical running time for index creation on the human genome?

    I understand that if memory is the issue, I might try the "-d" parameter to split the index in parts, which leads to my second question :
    Does the index splitting has any performance impact on the next step of the algorithm, finding candidate alignment locations, and to what extent ?
    ( I suppose it has an impact , otherwise index splitting would be done by default, wouldn't it ?)


    Thanks !

  • #2
    Originally posted by guillaum View Post
    Hi,

    I am trying Bfast for the first time and have some trouble with the creation of the index for the whole human genome.
    I want to create the 10 indexes as shown in the Bfast paper and experience very long running times (10 to 20 hours) for each one of them. The memory consumption goes up to 33 Gb, which arise difficulties and may cause this long running time , as maybe the system is swapping memory a lot. (I have a 64 Gb memory system, but I am not the only user)

    What is the typical running time for index creation on the human genome?

    I understand that if memory is the issue, I might try the "-d" parameter to split the index in parts, which leads to my second question :
    Does the index splitting has any performance impact on the next step of the algorithm, finding candidate alignment locations, and to what extent ?
    ( I suppose it has an impact , otherwise index splitting would be done by default, wouldn't it ?)


    Thanks !
    For the human genome, index creation can easily run on an 8-core machine in 5-6 hours (remember to use multi-threading). Also, I regularly build such indexes on 32GB RAM machines. Could you give the command you are using to create the indexes?

    Index splitting (beyond "-d 1") has significant performance impact as this requires expensive merging of the each of the split indexes. For "-d 1" where there indexes are split into four pieces, the performance decrease (of the "match" step) is not too bad. If you have 24G or greater of RAM, you should not need to split the indexes.

    Comment


    • #3
      Thanks for this information. So I should run it with "-d 1" if I have less than 24 Gb available.

      The command I used was

      ./bin/bfast index -f all.fasta -A 0 -m 1111011010001000110101100101100110100111 -t -w 14 -i 9

      Comment


      • #4
        Originally posted by guillaum View Post
        Thanks for this information. So I should run it with "-d 1" if I have less than 24 Gb available.

        The command I used was
        With "-d 1" it runs comfortably in 8GB of RAM. How big is all.fasta (3.2x10^9)?

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        9 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        49 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        67 views
        0 likes
        Last Post seqadmin  
        Working...
        X