Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bfast index Fatal Error

    Hi guys,
    i try to create index, bfast v.0.7.0a, in color space with that command:

    bfast index -f hg19.fasta -i 1 -w 14 -m 1111111111111111 -n 8 -A 1;

    but i get an error i cant understand. I get similar error on all the masks which must be built. The only difference is that this number 1032000000 changes for every mask.

    Pass 1 out of 2. Out of 2858669513, currently on:
    1032000000************************************************************
    In function "RGIndexGetHashIndex": Fatal Error[OutOfRange]. Variable/Value: aBase.
    Message: Could not understand base.
    ***** Exiting due to errors *****
    ************************************************************
    ************************************************************

    What should i do? I have no idea.

    Thank you for your time and help!

  • #2
    Can you make sure that the hg19.fasta checksum matches the source from where you received the data? It looks like a problem in your FASTA.

    Comment


    • #3
      Hi Nils,
      yesterday i downloaded a human genome from UCSC and joined the chromosome with cat. I am currently trying to index that one but i still get error after building for every mask:

      Pass 1 out of 2. Out of 2861336622, currently on:
      1396000000************************************************************
      In function "RGIndexGetHashIndex": Fatal Error[OutOfRange]. Variable/Value: aBase.
      Message: Could not understand base.
      ***** Exiting due to errors *****
      ************************************************************

      I also get this warnings:

      ************************************************************
      ************************************************************
      Reading in reference genome from human_hg19.fa.cs.brg.
      In total read 25 contigs for a total of 3095693983 bases
      ************************************************************
      Creating the index...
      ************************************************************
      Warning: startContig was less than zero.
      Defaulting to contig=1 and position=1.
      ************************************************************
      ************************************************************
      Warning: endContig was greater than the number of contigs in the reference genome.
      Defaulting to reference genome's end contig=25 and position=59373566.
      ************************************************************

      Have no idea what these mean tho. I have all the chromosomes + M, Y, X.

      Should i remove the Ns in the genome? Or some other different from ATCG?

      Thank for any help!

      Comment


      • #4
        Btw, i checked my genome for nucleotides different from A,T,G,C and N and found none. I used that command:
        --> grep -ivP 'a|t|g|c|N' human_hg19.fa

        I also checked the source code for that function RGIndexGetHashIndex. It seems that when a base different from A,T,G,C is found the program exits. Does that mean i have to change the Ns to something else?

        Comment


        • #5
          Nope, Ns should be fine. You could print out the base where the error is occurring though.

          Comment


          • #6
            I see. will have to modify the source and recompile i suppose. i will try that and post the result.

            Comment


            • #7
              Btw which file should i modify and how(im not a C programmer) so i can see what kind of base is breaking my index. Im getting really frustrated. Already 2 weeks i cant build that index of human.

              Today i modified the file RGIndex.c, added line 2623, and recompiled:

              2621 default:
              2622 PrintError(FnName, "aBase", "Could not understand base", Exit, OutOfRange);
              2623 printf("ERR-BASE%s \n",aBase);
              2624 break;

              The program exited with the same error but no sign of my printf:

              1476000000************************************************************
              In function "RGIndexGetHashIndex": Fatal Error[OutOfRange]. Variable/Value: aBase.
              Message: Could not understand base.
              ***** Exiting due to errors *****
              ************************************************************
              ************************************************************


              I tried and another thing. I extracted with grep 500 lines from each chromosome from the human.fa and created a small_human.fa. then i ran the bfast index on that small file and it finished without error. But on the big one it chokes up all the time.

              Please help.

              Comment


              • #8
                Put the printf before the error statement, and use "fprintf(stderr," instead of "printf(".

                Comment


                • #9
                  Hi Nils,
                  i did the modification to the source as such:
                  -------------
                  default:
                  fprintf(stderr,"ERR-BASE: %s :\n",aBase);
                  PrintError(FnName, "aBase", "Could not understand base", Exit, OutOfRange);
                  break;
                  -------------

                  recompiled and ran again. using this script for color space:

                  ---------------
                  #!/usr/bin/bash
                  # for COLOR SPACE

                  # create ref genome
                  bfast fasta2brg -f $1 -A 1;

                  # creates the indexes
                  I=1;
                  for MASK in 1111111111111111111111 111110100111110011111111111 10111111011001100011111000111111 1111111100101111000001100011111011 111111110001111110011111111 11111011010011000011000110011111111 1111111111110011101111111 111011000011111111001111011111 1110110001011010011100101111101111 111111001000110001011100110001100011111
                  do
                  bfast index -f $1 -i $I -w 14 -m $MASK -n 8 -A 1;
                  let I=I+1;
                  done
                  -------------------

                  But during the first mask it exited with Seg fault:

                  -------------------
                  Pass 1 out of 2. Out of 2861336547, currently on:
                  1165000000create_index_bfast.sh: line 9: 31417 Segmentation fault bfast index -f $1 -i $I -w 14 -m $MASK -n 8 -A 1
                  ************************************************************
                  -------------------

                  This time even worse cos no error message like before. Now i try to build the index only with 2 threads. Maybe something happens when im using 8 threads.

                  Comment


                  • #10
                    I tried to run it on 4 threads and i still got SEG FAULT.
                    Frustrating and i need to use BFAST for SOLID PE data which is more complicated to do with BWA.

                    Comment


                    • #11
                      Hi again,
                      im sorry for posting so many times but i still have problems.
                      Now i decided to try the git version 0.7.0b.

                      I ran it like this:
                      bfast index -f hg19.fa -i 1 -w 14 -m 1111111111111111111111 -A 1 -n 8

                      and I got an error like this:
                      ---------------
                      Currently on [contig,pos]:
                      [------16,---13000000]
                      ************************************************************
                      In function "RGBinaryGetBase": Fatal Error[OutOfRange]. Variable/Value: repeat.
                      Message: Could not understand repeat.
                      ***** Exiting due to errors *****
                      ************************************************************
                      --------------------

                      then i modified the source (RGBinary.c) so that it prints what is the problem:
                      ------------
                      default:
                      fprintf(stderr,"\nERR: %s : %d :\n",curChar,curByte); #### my addition
                      PrintError(FnName, "repeat", "Could not understand repeat", Exit, OutOfRange);
                      ------------

                      and it got me this lines:
                      -------------------
                      Currently on [contig,pos]:
                      [------16,---13000000]
                      ERR: (null) : 15 :
                      ************************************************************
                      In function "RGBinaryGetBase": Fatal Error[OutOfRange]. Variable/Value: repeat.
                      Message: Could not understand repeat.
                      ***** Exiting due to errors *****
                      -------------------

                      So i have 4 questions:
                      Q1. Is what i did correct?
                      Q2. Does the result suggests that i have empty spot/char in my file?
                      Q3. What can i do to improve the situation and make the program finish correctly?
                      Q4. Could you please provide me with a link to a human ref genome which was successfully indexed by bfast?

                      PS: I also checked for empty lines and empty characters in the ref genome with grep but found none. I used these two commands which gave no result:
                      grep -m1 -i -P '[a|c|t|g] [a|c|t|g]' hg19.fa - search for empty spaces
                      grep -A 1 -B 1 '^$' hg19.fa - search for empty lines.
                      Last edited by kenietz; 03-06-2012, 01:39 AM.

                      Comment


                      • #12
                        Hi Nils,
                        the git version managed to create one index file with that command:
                        bfast index -f hg19.fa -i 1 -w 14 -m 1111111111111111111111 -A 1 -n 1

                        With more than 1 thread problems arise. Which kind of sucks cos it took 8.30h to build it. Multiple by 10 is a whole week.

                        I dont know why such a problem arise tho. I have an i7 3930K CPU with 64GB RAM and i would like to use them. I work on a Slackware linux 13, 64 bit version.

                        Should i upgrade some libraries or smth?
                        Thank you for your help.

                        Comment


                        • #13
                          Can you try on a different machine? With the given info, I have little to add.

                          Comment


                          • #14
                            Yes, thats my plan also. Will write back when i try on another machine.

                            Comment


                            • #15
                              Hi Nils,
                              i have a question.

                              Which glibc do you use? I use ver 2.11 and may be there is a problem with that. I saw the newest version is 2.14.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-27-2024, 06:37 PM
                              0 responses
                              12 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-27-2024, 06:07 PM
                              0 responses
                              11 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              53 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              69 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X