Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bfast index creation

    Hi,

    I am trying Bfast for the first time and have some trouble with the creation of the index for the whole human genome.
    I want to create the 10 indexes as shown in the Bfast paper and experience very long running times (10 to 20 hours) for each one of them. The memory consumption goes up to 33 Gb, which arise difficulties and may cause this long running time , as maybe the system is swapping memory a lot. (I have a 64 Gb memory system, but I am not the only user)

    What is the typical running time for index creation on the human genome?

    I understand that if memory is the issue, I might try the "-d" parameter to split the index in parts, which leads to my second question :
    Does the index splitting has any performance impact on the next step of the algorithm, finding candidate alignment locations, and to what extent ?
    ( I suppose it has an impact , otherwise index splitting would be done by default, wouldn't it ?)


    Thanks !

  • #2
    Originally posted by guillaum View Post
    Hi,

    I am trying Bfast for the first time and have some trouble with the creation of the index for the whole human genome.
    I want to create the 10 indexes as shown in the Bfast paper and experience very long running times (10 to 20 hours) for each one of them. The memory consumption goes up to 33 Gb, which arise difficulties and may cause this long running time , as maybe the system is swapping memory a lot. (I have a 64 Gb memory system, but I am not the only user)

    What is the typical running time for index creation on the human genome?

    I understand that if memory is the issue, I might try the "-d" parameter to split the index in parts, which leads to my second question :
    Does the index splitting has any performance impact on the next step of the algorithm, finding candidate alignment locations, and to what extent ?
    ( I suppose it has an impact , otherwise index splitting would be done by default, wouldn't it ?)


    Thanks !
    For the human genome, index creation can easily run on an 8-core machine in 5-6 hours (remember to use multi-threading). Also, I regularly build such indexes on 32GB RAM machines. Could you give the command you are using to create the indexes?

    Index splitting (beyond "-d 1") has significant performance impact as this requires expensive merging of the each of the split indexes. For "-d 1" where there indexes are split into four pieces, the performance decrease (of the "match" step) is not too bad. If you have 24G or greater of RAM, you should not need to split the indexes.

    Comment


    • #3
      Thanks for this information. So I should run it with "-d 1" if I have less than 24 Gb available.

      The command I used was

      ./bin/bfast index -f all.fasta -A 0 -m 1111011010001000110101100101100110100111 -t -w 14 -i 9

      Comment


      • #4
        Originally posted by guillaum View Post
        Thanks for this information. So I should run it with "-d 1" if I have less than 24 Gb available.

        The command I used was
        With "-d 1" it runs comfortably in 8GB of RAM. How big is all.fasta (3.2x10^9)?

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        30 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        32 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        28 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        52 views
        0 likes
        Last Post seqadmin  
        Working...
        X