Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bfast localalign looking for the wrong reference file

    Hello All,

    I'm attempting my first few runs on BFAST, and while everything seems to run fine through the index searching step, I cannot get past the localalign step; the program always dies with the following output:


    $ bfast localalign -f <file>.fa -m <file>.cs.1.bmf -A 1 -n 8 -t > <file>.cs.1.baf
    ************************************************************
    Checking input parameters supplied by the user ...
    Validating fastaFileName <file>.fa.
    Validating matchFileName<file>.cs.1.bmf.
    **** Input arguments look good! *****
    ************************************************************
    ************************************************************
    Printing Program Parameters:
    programMode: [ExecuteProgram]
    fastaFileName: <file>.fa
    matchFileName: <file>.cs.1.bmf
    scoringMatrixFileName: [Not Using]
    ungapped: [Not Using]
    unconstrained: [Not Using]
    space: [Color Space]
    startReadNum: 1
    endReadNum: 2147483647
    offsetLength: 20
    maxNumMatches: 384
    avgMismatchQuality: 10
    numThreads: 8
    queueLength: 10000
    pairedEndLength: [Not Using]
    mirroringType: [Not Using]
    forceMirroring: [Not Using]
    timing: [Using]
    ************************************************************
    ************************************************************
    Reading in reference genome from <file>.fa.nt.brg.
    ************************************************************
    In function "RGBinaryReadBinary": Fatal Error[OpenFileError]. Variable/Value: <file>.fa.nt.brg.
    Message: Could not open brgFileName for reading.
    The file stream error was:: No such file or directory
    ***** Exiting due to errors *****
    ************************************************************

    Why does it insist in finding a nucleotide space reference file? Everything up until this point has been done in color space, and it is so indicated in the command option values. Has anyone else encountered this problem?

    Regards to all...

  • #2
    Originally posted by baldeberre View Post
    Hello All,

    I'm attempting my first few runs on BFAST, and while everything seems to run fine through the index searching step, I cannot get past the localalign step; the program always dies with the following output:


    $ bfast localalign -f <file>.fa -m <file>.cs.1.bmf -A 1 -n 8 -t > <file>.cs.1.baf
    ************************************************************
    Checking input parameters supplied by the user ...
    Validating fastaFileName <file>.fa.
    Validating matchFileName<file>.cs.1.bmf.
    **** Input arguments look good! *****
    ************************************************************
    ************************************************************
    Printing Program Parameters:
    programMode: [ExecuteProgram]
    fastaFileName: <file>.fa
    matchFileName: <file>.cs.1.bmf
    scoringMatrixFileName: [Not Using]
    ungapped: [Not Using]
    unconstrained: [Not Using]
    space: [Color Space]
    startReadNum: 1
    endReadNum: 2147483647
    offsetLength: 20
    maxNumMatches: 384
    avgMismatchQuality: 10
    numThreads: 8
    queueLength: 10000
    pairedEndLength: [Not Using]
    mirroringType: [Not Using]
    forceMirroring: [Not Using]
    timing: [Using]
    ************************************************************
    ************************************************************
    Reading in reference genome from <file>.fa.nt.brg.
    ************************************************************
    In function "RGBinaryReadBinary": Fatal Error[OpenFileError]. Variable/Value: <file>.fa.nt.brg.
    Message: Could not open brgFileName for reading.
    The file stream error was:: No such file or directory
    ***** Exiting due to errors *****
    ************************************************************

    Why does it insist in finding a nucleotide space reference file? Everything up until this point has been done in color space, and it is so indicated in the command option values. Has anyone else encountered this problem?

    Regards to all...
    It is not a bug. You need to build the nucleotide space version since we want the alignments to result in bases.

    Nils

    Comment


    • #3
      Thanks Nils, and sorry for the naiveté...

      Comment


      • #4
        Originally posted by baldeberre View Post
        Thanks Nils, and sorry for the naiveté...
        Please keep posting problems, I try to log on during the day so I can answer the easy questions

        Comment


        • #5
          hi everyone,

          I have a problem also with the localalign tool. the match tool is working fine and my bmf files look ok. but then when imputing them into localalign it stops.
          here is my script :

          #! /bin/bash
          #parameter for PBS
          #PBS -q smp
          #PBS -l walltime=20:00:00
          #PBS -l mem=24gb
          #PBS -M email
          #PBS -m abe
          #PBS -N whole_exome_S1

          #start of BFAST
          module load bfast-gcc
          cd $PBS_O_WORKDIR

          # creating an array to launch automatically the alignment of the 8 reads files in parallele
          N=$PBS_ARRAYID

          bfast match -f /vlsci/VR0053/shared/Exome_analyses/ref/hg19.fa -A 1 -r reads.$N.fastq > bfast.matches.file.hg19.$N.bmf

          bfast localalign -f /vlsci/VR0053/shared/Exome_analyses/ref/hg19.fa -m bfast.matches.file.hg19.$N.bmf -A 1 > bfast.aligned.file.hg19.$N.baf


          and here is the error message:

          ************************************************************
          Checking input parameters supplied by the user ...
          Validating fastaFileName /vlsci/VR0053/shared/Exome_analyses/ref/hg19.fa.
          Validating matchFileNamebfast.matches.file.hg19.4.bmf.
          **** Input arguments look good! *****
          ************************************************************
          ************************************************************
          Printing Program Parameters:
          programMode: [ExecuteProgram]
          fastaFileName: /vlsci/VR0053/shared/Exome_analyses/ref/hg19.fa
          matchFileName: bfast.matches.file.hg19.4.bmf
          scoringMatrixFileName: [Not Using]
          ungapped: [Not Using]
          unconstrained: [Not Using]
          space: [Color Space]
          startReadNum: 1
          endReadNum: 2147483647
          offsetLength: 20
          maxNumMatches: 384
          avgMismatchQuality: 10
          numThreads: 1
          queueLength: 10000
          timing: [Not Using]
          ************************************************************
          ************************************************************
          Reading in reference genome from /vlsci/VR0053/shared/Exome_analyses/ref/hg19.fa.nt.brg.
          In total read 25 contigs for a total of 3095693983 bases
          ************************************************************
          ************************************************************
          Reading match file from bfast.matches.file.hg19.4.bmf.
          ************************************************************
          Performing alignment...
          Reads processed: 0************************************************************
          ^MIn function "NormalizeColorSpaceRead": Fatal Error[OutOfRange]. Message: Could not convert base and color.
          ***** Exiting due to errors *****
          ************************************************************


          I don't understand why it can't convert base and color. I have created both CS and nt version of my ref genome...perhaps the answer is obvious but I can't find it.

          thanks in advance for your help!

          Comment


          • #6
            You need to find and post the offending read. Can you try to figure out which read has a weird character?

            Comment


            • #7
              thanks for your answer Nils. you re talking about the reads.fastq? I have a total of 8 reads files and they all give me this error message when run with localalign. here is the beginning of my reads.1.fastq file:


              @1_46_39
              13 15 25 -1 10 -1 4 -1 -1 -1 28 4 29 20 4 13 14 -1 -1 -1 25 5 -1 -1 -1 -1 23 -1 -1 20 4 4 -1 -1 -1 14 4 -1 4 19 -1 4 20 -1 22 -1 -1 -1 -1 -1
              +
              !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
              @1_46_46
              14 4 19 -1 15 -1 21 -1 -1 -1 21 30 18 27 15 4 29 -1 -1 -1 28 27 -1 -1 -1 -1 29 -1 -1 4 4 4 -1 -1 -1 4 4 -1 4 4 -1 4 4 -1 4 -1 -1 -1 -1 -1
              +
              !BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB!!!
              @1_46_65
              13 4 20 -1 12 -1 23 -1 -1 -1 22 7 27 28 4 13 20 23 -1 -1 29 20 -1 -1 -1 -1 25 -1 7 23 16 26 -1 -1 -1 17 6 -1 12 15 -1 4 29 -1 21 -1 -1 -1 -1 -1
              +
              !cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccBBB!!!
              @1_46_95
              21 4 22 -1 22 -1 19 -1 -1 -1 31 28 29 27 27 18 30 14 -1 -1 26 30 -1 -1 -1 -1 31 -1 26 26 4 27 -1 -1 -1 26 26 -1 6 12 -1 28 25 -1 28 -1 -1 -1 -1 -1
              +
              !~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~cccBBB!!!
              @1_46_170
              18 21 5 -1 20 -1 24 16 -1 -1 21 11 7 4 15 12 25 9 -1 -1 6 11 -1 -1 -1 -1 31 -1 7 17 4 23 -1 -1 -1 4 28 -1 9 15 -1 22 4 -1 28 -1 -1 -1 -1 -1
              +
              !~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~cBBB!!!
              @1_46_224
              33 28 32 -1 29 -1 10 -1 -1 -1 4 4 7 20 14 4 4 4 -1 -1 12 4 -1 -1 -1 -1 4 -1 9 4 4 4 -1 -1 -1 9 5 -1 6 18 -1 4 10 -1 5 -1 -1 -1 -1 -1
              +
              !~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~cBBB!!!
              @1_46_256
              23 4 22 -1 23 -1 8 -1 -1 -1 18 12 19 16 4 26 27 -1 -1 -1 9 12 -1 -1 -1 -1 4 -1 -1 4 4 6 -1 -1 -1 5 18 -1 16 8 -1 4 4 -1 5 -1 -1 -1 -1 -1
              +
              !~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~cBBB!!!
              @1_46_261
              27 4 8 -1 14 -1 26 4 -1 -1 27 22 17 23 12 31 29 6 -1 -1 6 25 4 -1 -1 -1 25 -1 5 11 9 15 -1 -1 -1 16 22 -1 21 6 -1 15 4 -1 13 -1 -1 -1 -1 -1

              or are you refering to another type of file?

              update1:
              I did have a look at the fastq format for solid and mine doesn't look correct...instead of having a color coded sequence below the @header It looks more like quality score...

              here is my convert read script:


              #!/bin/bash
              #parameter for PBS/ BFAST is a multi-threaded app (SMP parallel)
              #PBS -q smp
              #PBS -l walltime=10:00:00
              #PBS -l mem=24gb
              #PBS -M email
              #PBS -m abe
              #PBS -N sample1_convread

              #start of BFAST
              module load bfast-gcc

              #create a dir for pbs output into the work dir
              cd $PBS_O_WORKDIR

              # converts the read

              solid2fastq -n 10000000 -o reads /vlsci/VR0053/shared/Exome_analyses/data/sample1/*.qual /vlsci/VR0053/shared/Exome_analyses/data/sample1/*.csfasta


              the -o option is inverted in my script with the qual file before the csfasta...could it be the reason?

              update2:

              I reran the convert read inversing the qual and csfasta file and now the fastq file looks correct...I will rerun after that the match and localalign...
              Last edited by Fabrice ODEFREY; 08-28-2010, 04:11 AM.

              Comment


              • #8
                Originally posted by Fabrice ODEFREY View Post
                solid2fastq -n 10000000 -o reads /vlsci/VR0053/shared/Exome_analyses/data/sample1/*.qual /vlsci/VR0053/shared/Exome_analyses/data/sample1/*.csfasta
                The csfasta files come before the qual files as arguments to solid2fastq. See the description of the options to solid2fastq by running it with no arguments.

                Comment


                • #9
                  yep my rerun worked proporely this time and I was able to create my sam files. I will look now into concatenating my SAM files, if someone as some suggestions (samtools?)...

                  Comment


                  • #10
                    Use the "samtools merge" or "java -jar MergeSAMFile.jar" command. The former is from SAMtools (samtools.sf.net), and the latter is from Picard (picard.sf.net).

                    Comment


                    • #11
                      thanks a lot Nils, samtools is already installed on the cluster so I will go with that!

                      Comment


                      • #12
                        Similar problem....

                        I am actually getting the same error msg:

                        Performing alignment...
                        Reads processed: 0************************************************************
                        In function "AlignColorSpaceGappedConstrained": Fatal Error[OutOfRange]. Message: read and reference did not match.
                        ***** Exiting due to errors *****
                        ************************************************************

                        So I am using data from solid v3(4?) and while my fastq files are not reversed like the previous poster, the qq scores are in ASCII fmt:

                        @427_31_60
                        T32032100303032122000031200012321321000221203233032
                        +
                        @A;??9?5<;<<,&&792&51?%(-83%(-060-&.,-**0.)-(:85&-

                        Is it a matter of simply converting the qq scores to int? Also ,do I need to lose the starting T's on the reads?

                        I am also using indexes that were not generated on the same computer doing the aligning...they were coincidentally provided to me by the nelson lab =). Would this make a difference?

                        Thanks.

                        alden

                        Comment


                        • #13
                          Hi Aldino,

                          I'm not an expert yet but I don't think that it is a problem to have the indexes created on another computer. did you transfert also the *.nt.brg and *.cs.brg files of your reference? they have been created with the same *.fa file than the indexes?
                          you do need the T at the start of the read.
                          Last edited by Fabrice ODEFREY; 09-03-2010, 06:08 PM.

                          Comment


                          • #14
                            Originally posted by aldino View Post
                            I am actually getting the same error msg:

                            Performing alignment...
                            Reads processed: 0************************************************************
                            In function "AlignColorSpaceGappedConstrained": Fatal Error[OutOfRange]. Message: read and reference did not match.
                            ***** Exiting due to errors *****
                            ************************************************************
                            Be careful about endianness when transferring files as I removed endian support since it was slowing down reading in the indexes.

                            The above looks like a bug in BFAST (that is an internal consistency check). Could you report this to [email protected]. Make sure you include the exact command, and data to reproduce (you can try limiting the range of reads to find the offending read). Also, make sure you are using the latest version.

                            Comment


                            • #15
                              Hello,
                              I am actually getting a similar error message as aldino except I am using solexa reads. The error message is as follows:

                              My input command:
                              /filepath/bfast localalign -f /filepath/TAIR9_cdna_20090619.txt -m /filepath/bfast.matches.splitaa1_ZT0.bmf > /filepath/bfast.aligned.splitaa1_ZT0.baf


                              Checking input parameters supplied by the user ...
                              Validating fastaFileName /filepath/TAIR9_cdna_20090619.txt.
                              Validating matchFileName/filepath/bfast.matches.splitaa1_ZT0.bmf.
                              **** Input arguments look good! *****
                              ************************************************************
                              ************************************************************
                              Printing Program Parameters:
                              programMode: [ExecuteProgram]
                              fastaFileName: /filepath/TAIR9_cdna_20090619.txt
                              matchFileName: /filepath/bfast.matches.splitaa1_ZT0.bmf
                              scoringMatrixFileName: [Not Using]
                              ungapped: [Not Using]
                              unconstrained: [Not Using]
                              space: [NT Space]
                              startReadNum: 1
                              endReadNum: 2147483647
                              offsetLength: 20
                              maxNumMatches: 384
                              avgMismatchQuality: 10
                              numThreads: 1
                              queueLength: 10000
                              pairedEndLength: [Not Using]
                              mirroringType: [Not Using]
                              forceMirroring: [Not Using]
                              timing: [Not Using]
                              ************************************************************
                              ************************************************************
                              Reading in reference genome from /filepath/TAIR9_cdna_20090619.txt.nt.brg.
                              In total read 39640 contigs for a total of 60723639 bases
                              ************************************************************
                              ************************************************************
                              Reading match file from /filepath/bfast.matches.splitaa1_ZT0.bmf.
                              ************************************************************
                              Performing alignment...
                              Currently on:
                              thread:0 [4000]************************************************************
                              In function "AlignNTSpaceGappedConstrained": Fatal Error[OutOfRange]. Message: read and reference did not match.
                              ***** Exiting due to errors *****
                              ************************************************************


                              I am currently using version 0.6.4d, and will try using version e, but I just wanted to see if anyone has figured out why I am receiving this error.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Advancing Precision Medicine for Rare Diseases in Children
                                by seqadmin




                                Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                                12-16-2024, 07:57 AM
                              • seqadmin
                                Recent Advances in Sequencing Technologies
                                by seqadmin



                                Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                                Long-Read Sequencing
                                Long-read sequencing has seen remarkable advancements,...
                                12-02-2024, 01:49 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 12-17-2024, 10:28 AM
                              0 responses
                              22 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-13-2024, 08:24 AM
                              0 responses
                              42 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-12-2024, 07:41 AM
                              0 responses
                              28 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-11-2024, 07:45 AM
                              0 responses
                              42 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X