Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BFAST match problem

    Hi guys,

    I have a problem with BFAST. I am trying to align about 300 Million Solid reads (about 30GB of data) and using multiple indexes. When looking for CAL I unfortunately get a Segmentation Fault that kills the application. At the moment I am testing the software on a 8 cores machine equipped with 32GB of ram running a Linux Debian.
    According to the messages sent to the stderr it seems that the software crashes when it is copying (where?) the reads that are not aligned with the primary index in such a way to search them with the secondary indexes.

    In the following, an extract of the output messages I get before the software crashes:

    Code:
    Reads processed: 314999637
    Cleaning up index.
    Searching index file 1/1 (index #1, bin #1) complete...
    Found 14927144 matches.
    Found matches for 14927144 reads.
    Copying unmatched reads for secondary index search.
    Thing is I noticed that while executing BFAST is not using too much RAM (about 1.3GB) but it seems to disk cache quite a lot (eating up all the RAM in this way).

    Does anyone have any suggestions on what might be going on?
    Any help would be appreciated on how to solve this.
    Thanks

  • #2
    Originally posted by blu78 View Post
    Hi guys,

    I have a problem with BFAST. I am trying to align about 300 Million Solid reads (about 30GB of data) and using multiple indexes. When looking for CAL I unfortunately get a Segmentation Fault that kills the application. At the moment I am testing the software on a 8 cores machine equipped with 32GB of ram running a Linux Debian.
    According to the messages sent to the stderr it seems that the software crashes when it is copying (where?) the reads that are not aligned with the primary index in such a way to search them with the secondary indexes.

    In the following, an extract of the output messages I get before the software crashes:

    Code:
    Reads processed: 314999637
    Cleaning up index.
    Searching index file 1/1 (index #1, bin #1) complete...
    Found 14927144 matches.
    Found matches for 14927144 reads.
    Copying unmatched reads for secondary index search.
    Thing is I noticed that while executing BFAST is not using too much RAM (about 1.3GB) but it seems to disk cache quite a lot (eating up all the RAM in this way).

    Does anyone have any suggestions on what might be going on?
    Any help would be appreciated on how to solve this.
    Thanks
    Could you give the full command you are using? Also, try avoiding the secondary index search and simply use all your indexes in your primary search. This will improve sensitivity and accuracy, and is the recommended mode indicated in the manual.

    Comment


    • #3
      Hi Nils,

      Thanks for your reply.
      Here is the command I use (on a much smaller set of reads now -- 1 000 000):

      Code:
      bfast match -f contigs.fasta -r F3_10Mchunk.qfasta -i 1 -I 2,3 -A 1 -K 8 -M 512 -w 0 -n 8 > matchF3_10Mb 2> err_10Mb
      I get the same results even if I skip secondary indexes and, as you suggest, I use the following command:

      Code:
      bfast match -f contigs.fasta -r F3_10Mchunk.qfasta -i 1 1,2,3 -A 1 -K 8 -M 512 -w 0 -n 8 > matchF3_10Mb 2> err_10Mb
      and even if I do not specify the primary indexes with -i (in that case I would use all my 10 indexes instead of just three).

      What is kind of wierd is that in this latter case the segmentation fault occurs after searching index 4, while in the first and second case it occurs after searching index 1.

      Any clue on what is going on?

      Thanks again for your help...
      Cheers

      Comment


      • #4
        For completeness, here is the content of the stderr file obtained by $ cat err_10Mb

        First two cases:

        Code:
        ************************************************************
        Checking input parameters supplied by the user ...
        Validating fastaFileName contigs.fasta.
        Validating readsFileName F3_10Mchunk.qfasta.
        Validating tmpDir path ./.
        **** Input arguments look good!
        ************************************************************
        ************************************************************
        Printing Program Parameters:
        programMode:                            [ExecuteProgram]
        fastaFileName:                          contigs.fasta
        mainIndexes                             1,2,3
        secondaryIndexes                        [Not Using]
        readsFileName:                          F3_10Mchunk.qfasta
        offsets:                                [Using All]
        loadAllIndexes:                         [Not Using]
        compression:                            [Not Using]
        space:                                  [Color Space]
        startReadNum:                           1
        endReadNum:                             2147483647
        keySize:                                [Not Using]
        maxKeyMatches:                          8
        maxNumMatches:                          512
        whichStrand:                            [Both Strands]
        numThreads:                             8
        queueLength:                            250000
        tmpDir:                                 ./
        timing:                                 [Not Using]
        ************************************************************
        Searching for main indexes...
        Found 3 index (3 total files).
        Not using secondary indexes.
        ************************************************************
        Reading in reference genome from contigs.fasta.cs.brg.
        In total read 11 contigs for a total of 776592 bases
        ************************************************************
        Reading F3_10Mchunk.qfasta into a temp file.
        Will process 1000000 reads.
        ************************************************************
        Searching index file 1/3 (index #1, bin #1)...
        Reading index from contigs.fasta.cs.1.1.bif.
        Read index from contigs.fasta.cs.1.1.bif.
        Reads processed: 1000000
        Cleaning up index.
        Searching index file 1/3 (index #1, bin #1) complete...
        Found 19503 matches.
        ************************************************************
        Searching index file 2/3 (index #2, bin #1)...
        Reading index from contigs.fasta.cs.2.1.bif.

        Last case (with 10 indexes):

        Code:
        ************************************************************
        Checking input parameters supplied by the user ...
        Validating fastaFileName contigs.fasta.
        Validating readsFileName F3_10Mchunk.qfasta.
        Validating tmpDir path ./.
        **** Input arguments look good!
        ************************************************************
        ************************************************************
        Printing Program Parameters:
        programMode:                            [ExecuteProgram]
        fastaFileName:                          contigs.fasta
        mainIndexes                             [Auto-recognizing]
        secondaryIndexes                        [Not Using]
        readsFileName:                          F3_10Mchunk.qfasta
        offsets:                                [Using All]
        loadAllIndexes:                         [Not Using]
        compression:                            [Not Using]
        space:                                  [Color Space]
        startReadNum:                           1
        endReadNum:                             2147483647
        keySize:                                [Not Using]
        maxKeyMatches:                          8
        maxNumMatches:                          512
        whichStrand:                            [Both Strands]
        numThreads:                             8
        queueLength:                            250000
        tmpDir:                                 ./
        timing:                                 [Not Using]
        ************************************************************
        Searching for main indexes...
        Found 10 index (10 total files).
        Not using secondary indexes.
        ************************************************************
        Reading in reference genome from contigs.fasta.cs.brg.
        In total read 11 contigs for a total of 776592 bases
        ************************************************************
        Reading F3_10Mchunk.qfasta into a temp file.
        Will process 1000000 reads.
        ************************************************************
        Searching index file 1/10 (index #1, bin #1)...
        Reading index from contigs.fasta.cs.1.1.bif.
        Read index from contigs.fasta.cs.1.1.bif.
        Reads processed: 1000000
        Cleaning up index.
        Searching index file 1/10 (index #1, bin #1) complete...
        Found 19503 matches.
        ************************************************************
        Searching index file 2/10 (index #2, bin #1)...
        Reading index from contigs.fasta.cs.2.1.bif.
        Read index from contigs.fasta.cs.2.1.bif.
        Reads processed: 1000000
        Cleaning up index.
        Searching index file 2/10 (index #2, bin #1) complete...
        Found 18896 matches.
        ************************************************************
        Searching index file 3/10 (index #3, bin #1)...
        Reading index from contigs.fasta.cs.3.1.bif.
        Read index from contigs.fasta.cs.3.1.bif.
        Reads processed: 1000000
        Cleaning up index.
        Searching index file 3/10 (index #3, bin #1) complete...
        Found 14092 matches.
        ************************************************************
        Searching index file 4/10 (index #4, bin #1)...
        Reading index from contigs.fasta.cs.4.1.bif.

        Thanks

        Comment


        • #5
          Originally posted by blu78 View Post
          Hi Nils,

          Thanks for your reply.
          Here is the command I use (on a much smaller set of reads now -- 1 000 000):

          Code:
          bfast match -f contigs.fasta -r F3_10Mchunk.qfasta -i 1 -I 2,3 -A 1 -K 8 -M 512 -w 0 -n 8 > matchF3_10Mb 2> err_10Mb
          I get the same results even if I skip secondary indexes and, as you suggest, I use the following command:

          Code:
          bfast match -f contigs.fasta -r F3_10Mchunk.qfasta -i 1 1,2,3 -A 1 -K 8 -M 512 -w 0 -n 8 > matchF3_10Mb 2> err_10Mb
          and even if I do not specify the primary indexes with -i (in that case I would use all my 10 indexes instead of just three).

          What is kind of wierd is that in this latter case the segmentation fault occurs after searching index 4, while in the first and second case it occurs after searching index 1.

          Any clue on what is going on?

          Thanks again for your help...
          Cheers
          Try this:
          Code:
          bfast match -f contigs.fasta -r F3_10Mchunk.qfasta -A 1 -n 8 > matchF3_10Mb 2> err_10Mb
          Make sure that you have built all 10 indexes as recommended in the manual and use them as all as primary indexes. I also removed some options above as I would still recommend using the default parameters

          It also looks like your stderr and stdout show that the process is not complete (meaning they don't show crashing on the "copying for next index search").

          Comment


          • #6
            I have tried
            Code:
            bfast match -f contigs.fasta -r F3_10Mchunk.qfasta -A 1 -n 8 > matchF3_10Mb 2> err_10Mb
            but I still get the segmentation fault.

            Indexes are created with the following commands:
            Code:
            bfast index -f contigs.fasta -A 1 -m 1111111111111111111111 -i 1 -w 12 -n 8
            bfast index -f contigs.fasta -A 1 -m 111110100111110011111111111 -i 2 -w 12 -n 8
            bfast index -f contigs.fasta -A 1 -m 10111111011001100011111000111111 -i 3 -w 12 -n 8
            bfast index -f contigs.fasta -A 1 -m 1111111100101111000001100011111011 -i 4 -w 12 -n 8
            
            ...
            
            bfast index -f contigs.fasta -A 1 -m 1110110001011010011100101111101111 -i 10 -w 12 -n 8
            Thanks

            Comment


            • #7
              Originally posted by blu78 View Post
              I have tried
              Code:
              bfast match -f contigs.fasta -r F3_10Mchunk.qfasta -A 1 -n 8 > matchF3_10Mb 2> err_10Mb
              but I still get the segmentation fault.

              Indexes are created with the following commands:
              Code:
              bfast index -f contigs.fasta -A 1 -m 1111111111111111111111 -i 1 -w 12 -n 8
              bfast index -f contigs.fasta -A 1 -m 111110100111110011111111111 -i 2 -w 12 -n 8
              bfast index -f contigs.fasta -A 1 -m 10111111011001100011111000111111 -i 3 -w 12 -n 8
              bfast index -f contigs.fasta -A 1 -m 1111111100101111000001100011111011 -i 4 -w 12 -n 8
              
              ...
              
              bfast index -f contigs.fasta -A 1 -m 1110110001011010011100101111101111 -i 10 -w 12 -n 8
              Thanks
              I apologize for your experience so far. The best way to move forward is to email me the reference and reads, and then I can quickly debug myself to see if their is a bug in the program or a configuration issue. What does the "F3_10Mchunk.qfasta" file look like?

              Comment


              • #8
                Hi, I think I have a bit more information on this problem.
                I tried to rerun the smaller example with 4 cores instead of 8 (i.e. -n 4) and apparently the Segmentation Fault problem is solved.
                The box I was running bfast on has eight cores, theoretically should I set -n 8? Does this parameter need to be a power of 2?

                At the moment I am trying to run the bigger example with 4 cores. I will fill you in as soon as I will know the result of this computation.

                Thanks

                Comment


                • #9
                  Originally posted by blu78 View Post
                  Hi, I think I have a bit more information on this problem.
                  I tried to rerun the smaller example with 4 cores instead of 8 (i.e. -n 4) and apparently the Segmentation Fault problem is solved.
                  The box I was running bfast on has eight cores, theoretically should I set -n 8? Does this parameter need to be a power of 2?

                  At the moment I am trying to run the bigger example with 4 cores. I will fill you in as soon as I will know the result of this computation.

                  Thanks
                  That is very odd. The "index" step needs to be a power of two, but the rest of the program does not. Most users use it successfully with "-n 8" on 8-core machines.

                  Comment


                  • #10
                    I also tried the bigger example and that worked too with 4 cores. Oddly with 1 index and 8 cores it works too but with more indexes I cannot use more than 4 cores.
                    Last edited by blu78; 04-11-2010, 01:31 AM.

                    Comment


                    • #11
                      Originally posted by blu78 View Post
                      I also tried the bigger example and that worked too with 4 cores. Oddly with 1 index and 8 cores it works too but with more indexes I cannot use more than 4 cores.
                      I can't think of a reasonable explanation for this beyond hardware or OS configuration. What version are you currently using? Could you post the result of your "./configure" set up script?

                      Comment


                      • #12
                        Hi,

                        I am using Linux Debian 64 bit.
                        Here is the output of configure. Thanks for all your help.
                        Code:
                        $ ./configure
                        checking for a BSD-compatible install... /usr/bin/install -c
                        checking whether build environment is sane... yes
                        checking for a thread-safe mkdir -p... /bin/mkdir -p
                        checking for gawk... no
                        checking for mawk... mawk
                        checking whether make sets $(MAKE)... yes
                        checking build system type... x86_64-unknown-linux-gnu
                        checking for gcc... gcc
                        checking for C compiler default output file name... a.out
                        checking whether the C compiler works... yes
                        checking whether we are cross compiling... no
                        checking for suffix of executables...
                        checking for suffix of object files... o
                        checking whether we are using the GNU C compiler... yes
                        checking whether gcc accepts -g... yes
                        checking for gcc option to accept ISO C89... none needed
                        checking for style of include used by make... GNU
                        checking dependency style of gcc... gcc3
                        checking for a BSD-compatible install... /usr/bin/install -c
                        ./configure: line 3462: git: command not found
                        checking for BZ2_bzRead in -lbz2... yes
                        checking for an ANSI C-conforming const... yes
                        checking how to run the C preprocessor... gcc -E
                        checking for grep that handles long lines and -e... /bin/grep
                        checking for egrep... /bin/grep -E
                        checking for ANSI C header files... yes
                        checking for sys/types.h... yes
                        checking for sys/stat.h... yes
                        checking for stdlib.h... yes
                        checking for string.h... yes
                        checking for memory.h... yes
                        checking for strings.h... yes
                        checking for inttypes.h... yes
                        checking for stdint.h... yes
                        checking for unistd.h... yes
                        checking for stdlib.h... (cached) yes
                        checking for GNU libc compatible malloc... yes
                        checking for stdlib.h... (cached) yes
                        checking for GNU libc compatible realloc... yes
                        checking for pow in -lm... yes
                        checking for gzread in -lz... yes
                        checking for floor... yes
                        checking for pow... yes
                        checking for sqrt... yes
                        checking for strchr... yes
                        checking for strdup... yes
                        checking for strpbrk... yes
                        checking for strstr... yes
                        checking for strtok_r... yes
                        checking for int8_t... yes
                        checking for int32_t... yes
                        checking for int64_t... yes
                        checking for uint8_t... yes
                        checking for uint32_t... yes
                        checking for uint64_t... yes
                        checking for short int... yes
                        checking size of short int... 2
                        checking for int... yes
                        checking size of int... 4
                        checking for long int... yes
                        checking size of long int... 8
                        checking for ANSI C header files... (cached) yes
                        checking limits.h usability... yes
                        checking limits.h presence... yes
                        checking for limits.h... yes
                        checking for stdint.h... (cached) yes
                        checking for stdlib.h... (cached) yes
                        checking for string.h... (cached) yes
                        checking sys/time.h usability... yes
                        checking sys/time.h presence... yes
                        checking for sys/time.h... yes
                        checking for unistd.h... (cached) yes
                        checking float.h usability... yes
                        checking float.h presence... yes
                        checking for float.h... yes
                        checking zlib.h usability... yes
                        checking zlib.h presence... yes
                        checking for zlib.h... yes
                        checking bzlib.h usability... yes
                        checking bzlib.h presence... yes
                        checking for bzlib.h... yes
                        checking fcntl.h usability... yes
                        checking fcntl.h presence... yes
                        checking for fcntl.h... yes
                        checking for inline... inline
                        configure: creating ./config.status
                        config.status: creating Makefile
                        config.status: creating bfast/Makefile
                        config.status: creating butil/Makefile
                        config.status: creating scripts/Makefile
                        config.status: creating tests/Makefile
                        config.status: creating config.h
                        config.status: config.h is unchanged
                        config.status: executing depfiles commands
                        Last edited by blu78; 04-11-2010, 09:49 AM.

                        Comment


                        • #13
                          Originally posted by blu78 View Post
                          Hi,

                          I am using Linux Debian 64 bit.
                          Here is the output of configure. Thanks for all your help.
                          Code:
                          $ ./configure
                          checking for a BSD-compatible install... /usr/bin/install -c
                          checking whether build environment is sane... yes
                          checking for a thread-safe mkdir -p... /bin/mkdir -p
                          checking for gawk... no
                          checking for mawk... mawk
                          checking whether make sets $(MAKE)... yes
                          checking build system type... x86_64-unknown-linux-gnu
                          checking for gcc... gcc
                          checking for C compiler default output file name... a.out
                          checking whether the C compiler works... yes
                          checking whether we are cross compiling... no
                          checking for suffix of executables...
                          checking for suffix of object files... o
                          checking whether we are using the GNU C compiler... yes
                          checking whether gcc accepts -g... yes
                          checking for gcc option to accept ISO C89... none needed
                          checking for style of include used by make... GNU
                          checking dependency style of gcc... gcc3
                          checking for a BSD-compatible install... /usr/bin/install -c
                          ./configure: line 3462: git: command not found
                          checking for BZ2_bzRead in -lbz2... yes
                          checking for an ANSI C-conforming const... yes
                          checking how to run the C preprocessor... gcc -E
                          checking for grep that handles long lines and -e... /bin/grep
                          checking for egrep... /bin/grep -E
                          checking for ANSI C header files... yes
                          checking for sys/types.h... yes
                          checking for sys/stat.h... yes
                          checking for stdlib.h... yes
                          checking for string.h... yes
                          checking for memory.h... yes
                          checking for strings.h... yes
                          checking for inttypes.h... yes
                          checking for stdint.h... yes
                          checking for unistd.h... yes
                          checking for stdlib.h... (cached) yes
                          checking for GNU libc compatible malloc... yes
                          checking for stdlib.h... (cached) yes
                          checking for GNU libc compatible realloc... yes
                          checking for pow in -lm... yes
                          checking for gzread in -lz... yes
                          checking for floor... yes
                          checking for pow... yes
                          checking for sqrt... yes
                          checking for strchr... yes
                          checking for strdup... yes
                          checking for strpbrk... yes
                          checking for strstr... yes
                          checking for strtok_r... yes
                          checking for int8_t... yes
                          checking for int32_t... yes
                          checking for int64_t... yes
                          checking for uint8_t... yes
                          checking for uint32_t... yes
                          checking for uint64_t... yes
                          checking for short int... yes
                          checking size of short int... 2
                          checking for int... yes
                          checking size of int... 4
                          checking for long int... yes
                          checking size of long int... 8
                          checking for ANSI C header files... (cached) yes
                          checking limits.h usability... yes
                          checking limits.h presence... yes
                          checking for limits.h... yes
                          checking for stdint.h... (cached) yes
                          checking for stdlib.h... (cached) yes
                          checking for string.h... (cached) yes
                          checking sys/time.h usability... yes
                          checking sys/time.h presence... yes
                          checking for sys/time.h... yes
                          checking for unistd.h... (cached) yes
                          checking float.h usability... yes
                          checking float.h presence... yes
                          checking for float.h... yes
                          checking zlib.h usability... yes
                          checking zlib.h presence... yes
                          checking for zlib.h... yes
                          checking bzlib.h usability... yes
                          checking bzlib.h presence... yes
                          checking for bzlib.h... yes
                          checking fcntl.h usability... yes
                          checking fcntl.h presence... yes
                          checking for fcntl.h... yes
                          checking for inline... inline
                          configure: creating ./config.status
                          config.status: creating Makefile
                          config.status: creating bfast/Makefile
                          config.status: creating butil/Makefile
                          config.status: creating scripts/Makefile
                          config.status: creating tests/Makefile
                          config.status: creating config.h
                          config.status: config.h is unchanged
                          config.status: executing depfiles commands
                          I just can't figure out why it would not work with 8 threads but only 4 threads. What version of BFAST are you using?

                          Nils

                          Comment


                          • #14
                            Hi,

                            I am using BFAST 0.6.4a

                            Thanks for your help

                            Comment


                            • #15
                              Originally posted by blu78 View Post
                              Hi,

                              I am using BFAST 0.6.4a

                              Thanks for your help
                              I don't have any new leads. Most users I know run with 8-threads on all parts of BFAST. Maybe there is a memory bus issue that is causing it to crash since there are too many threads trying to access memory. This seems unlikely and is difficult to prove. Stick with your four threads and we'll see if any other users experience the same thing,

                              Nils

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              9 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              50 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              67 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X