Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bwa aln - fail to locate index

    Hu gurus

    New to NGS. Got data (rice) in .fastq format. Taken oryza sativa as reference genome. When did bwa aln getting error fail to locate index and sai file produced is empty.

    [bwa_aln] 17bp reads: max_diff = 2
    [bwa_aln] 38bp reads: max_diff = 3
    [bwa_aln] 64bp reads: max_diff = 4
    [bwa_aln] 93bp reads: max_diff = 5
    [bwa_aln] 124bp reads: max_diff = 6
    [bwa_aln] 157bp reads: max_diff = 7
    [bwa_aln] 190bp reads: max_diff = 8
    [bwa_aln] 225bp reads: max_diff = 9
    [bwa_aln] fail to locate the index

    regds

  • #2
    What was the command you used to run bwa?

    How did you specify the path to the index file?

    Comment


    • #3
      Did you create the required index files for the rice genome (unless you have downloaded pre-created index files along with the fasta genome)?

      See "index" option: http://bio-bwa.sourceforge.net/bwa.shtml

      Comment


      • #4
        where will i get the required ref genome oryza.fa file. My input sequence data is of the format
        50_13_index12_CTTGTA_L008_R1_001.fastq

        @HWI-D00239:37:C2AVVACXX:8:1101:1254:1997 1:N:0:CTTGTA
        GGACCCACATGTCAGTTTCACACAGAAATTATAAAAAAGGTGGAGCCCACATGGGCCCCACATGTCAGTTTCACATAAGATTTATAAAAAAAGGTGGGGCC
        +
        @CCFFFFFHHHHHJJHIIHJIJJIGEGIIIIGIEGIJJHH?GHIIIIIIHJFGIJIJJJHHHDEDEFF;BCDEEECCDDDDDDDDEEECDDD5<+><<@##
        @HWI-D00239:37:C2AVVACXX:8:1101:1903:1999 1:N:0:CTTGTA
        TTTGCTCAATTGGCCTACAACCAAGGGATAAATAGAAATTACCACAGGATCTTCTTACAATGATGAAAATTTCATTTGTAATAATCATTTTACTACCTCAA
        +
        CCCFFFFFHHHHHJJJJJJJIJJJIJJHIJJJJJJJJIJJJJJJJJJJJJJJJJJJJJJJJJJJJGIJIJJJHIHEHHHHHHFFFFFFFEEFEEDEDDD@@
        @HWI-D00239:37:C2AVVACXX:8:1101:1988:2000 1:N:0:CTTGTA
        GTTCAAAATTTTCTGAAATTCATTCAAAATTAATGGAAAATTGATGTCTTGAAGGGGCTCGGGAATTCGCTAGCCAATCAAGATCATAAACCCTTAGCCTA
        +

        Comment


        • #5
          If you work with rice then you should know which genome you need. A search brings up this version at Ensembl: ftp://ftp.ensemblgenomes.org/pub/pla...za_sativa/dna/ Since there are so many rice varities this may or may not be what you need. Look at the "README" file. You can download the genome as a single file. For a large genomes like this it can take several hours and would require a server with a good amount of RAM.

          BTW: That sequence data you have is in "fastq" format which is standard format for NGS data http://en.wikipedia.org/wiki/FASTQ_format
          Last edited by GenoMax; 01-07-2014, 04:32 AM.

          Comment


          • #6
            bwa is so named because it uses a Burrows-Wheeler transform algorithm. That allows for very fast alignments of short sequences to a large reference sequence, by making a specially transformed version of the reference sequence. That transformed version is the reference index.

            It sounds like you don't have that index, or haven't made it yet. BWA has a couple of commands that will make it for you, if you can't download it from anywhere. (It will take a few hours to index a plant genome)

            Comment


            • #7
              FYI, bwa aln can generate the same "fail to locate the index" error message if it can't figure out how to parse the command line.

              In my case, this happened because, trying to reproduce the methods in a manuscript, I copied some command line options from the manuscript into my command line. Unfortunately the hyphens were unicode characters.

              So, for the command bwa aln -n 0.01 index.fa reads.fastq bwa's parser stopped when it got to the -n. Then it thought -n was the index filename, and failed.

              The error message would be much improved if it told you where it had tried to find the index file. That would have helped me figure out it was a parsing error. As it was, I had to dig into the source code and sprinkle in a bunch of printfs before I was able to figure it out.

              Even nicer would be if it checked the total number of arguments and let me know there was a problem. It seems that anything beyond what bwa presumes is the fastq filename is simply ignored silently.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              18 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              22 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              17 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              48 views
              0 likes
              Last Post seqadmin  
              Working...
              X