Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bwa_sai2sam_pe fail to locate the index

    Hi,

    I'm trying to run BWA.0.6.2 I notice that when I index my reference file it is missing some indexes. This seems to be fine if I align single end sequence.

    However when I want to run paired end sequence I get an error.
    It will do the 2 alignments fine - bwa aln etc
    But when I try to run bwa sampe I get this error.

    [bwa_sai2sam_pe] fail to locate the index

    Now I'm using the same path as when I ran bwa aln

    So I'm thinking it's complaining about the missing indexes - .rbwt, rpac & rsa

    Why does bwa0.6.2 not make these when you index the reference?

    Any help greatly appreciated,

    thanks alig

  • #2
    The BWA index format was changed in 0.6.0 to integrate the BWT and
    reverse BWT into the same file.

    Ref: https://github.com/lh3/bwa/blob/master/NEWS

    Release 0.5.10 and 0.6.0 (12 November, 2011)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    The 0.6.0 release comes with two major changes. Firstly, the index data
    structure has been changed to support genomes longer than 4GB. The forward
    and reverse backward genome is now integrated in one index.

    Comment


    • #3
      BWA0.6.2 indexes

      Thank you for the information re the new BWA indexes,

      All seems to be running now, think my problem was an error in my script.

      alig

      Comment


      • #4
        Alig,

        I'm getting the same error:

        [bwa_sai2sam_pe] fail to locate the index

        What was the error that you encountered in your script, as I might be doing the same thing.

        Thanks!

        Nowlan

        Comment


        • #5
          Can you provide the command you are using?

          Comment


          • #6
            bwa index S_lycopersicum_chromosomes.fa

            gunzip 141217HiSeq_Run_Sample_Sample_3_TTAGGC_L002_R1_001.fastq.gz
            gunzip 141217HiSeq_Run_Sample_Sample_3_TTAGGC_L002_R2_001.fastq.gz

            bwa aln S_lycopersicum_chromosomes.fa 141217HiSeq_Run_Sample_Sample_3_TTAGGC_L002_R1_001.fastq > L002_R1_001.sai

            bwa aln S_lycopersicum_chromosomes.fa 141217HiSeq_Run_Sample_Sample_3_TTAGGC_L002_R2_001.fastq > L002_R2_001.sai


            bwa sampe S_lycopersicum_chromosomes.fa L002_R1_001.sai L002_R2_001.sai 141217HiSeq_Run_Sample_Sample_3_TTAGGC_L002_R1_001.fastq 141217HiSeq_Run_Sample_Sample_3_TTAGGC_L002_R2_001.fastq > mutant002.sam

            [bwa_sai2sam_pe] fail to locate the index

            Comment


            • #7
              Which version of bwa are you using? Are all these files (and the index files) in the same directory and are not empty (i.e. they are non-zero bytes)? For future reference you do not need to uncompress the sequence files.
              Last edited by GenoMax; 03-13-2015, 09:37 AM.

              Comment


              • #8
                The index and .sai files all have data, and we're using the most recent version of BWA (0.7.12).

                We're going to try and rerun with -a bwtsw as an option while indexing, and we'll make sure all of the files are in the same directory. I'll let you know how it turns out.

                Thanks!

                Nowlan

                Comment


                • #9
                  We were able to get BWA to run correctly.

                  The only thing we had to change was to use -a bwtsw when indexing the genome.

                  This was a little unexpected as we were using the tomato genome, which has a moderately sized genome at just under 1Gb.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM
                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  25 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  29 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  25 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  52 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X