Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bwa sai to bam conversion and indexfile.nt.ann??

    Hi
    I am trying to test BWA with 454 read data larger than 200 nt using the bwasw option and the hg19 as indexing reference.

    BWA generates a ooutpu.sai which i try to convert to sam format and here is the problem.

    bwa gives the4 following message

    [bns_restore_core] fail to open file 'hg19.nt.ann'. Abort!
    Aborted

    The point is that I have not idea about what bwa ask me for the file hg19.nt.ann or what is the hg19.nt.ann file. This file is not generated with the other index files when I run the index function, so i am confusing.

    I checked the forum about other similar messages and surprinsingly I have found very little (almost nothing clarifying to my doubt) about this.

    Can anyone clarify me if this file xxx.nt.ann is normal output of bwa and how I can create it for converting a sai file to bam

    Thank you in advance.

  • #2
    Please provide your commands to indexing, generating sai and generating sam using bwa.
    The "nt" means "color indexing". I don't think 454 is color space. You might be using somebody's color space script as a template for your work and you need to modify it.
    Last edited by Richard Finney; 01-18-2012, 07:59 AM.

    Comment


    • #3
      Hi Richard thank you for your answer

      Effectively 454 is not in color space. Maybe i am doing somthing wrong, I do not know.

      I used the extract_sff script to convert sff to fastq and then prinseq to process the fastq

      @GCFF90V02JNZWW
      CATTTGTTCACTCATAATAAGAAAGTAGGGAGAGGAGAATGTTAACATACCTATAGATAATACATGCACTGTTCCTGCATGT
      +GCFF90V02JNZWW
      AB===B>>:::<<<=<<311/,,,242,,,/.89<?=889::ADA===AADFDDAAADDD??????ABBBABB==9:::=BB
      @GCFF90V02G5MHK
      ATATATGCTTTCATGAGAATGAGAGAGTCCTTCGAGCTGTAG
      +GCFF90V02G5MHK
      IIIIIIIIHHHIIIIIIFFFFFFFFFFFFFFF===@FFFFDD

      Then I used BWA for creating the hg19 index using

      ./bwa index -a bwtsw -p hg19 hg19.fa (so i did not use -c)

      for the alignment

      I first used ./bwa aln and the bwa worked although only aligned the shrotest reads as it may be expected. Then I converted this sai output to bam and had not problems in doing that.

      Next and here comes my troubles, I used bwast for testing bwa with larger reads using the following

      ./bwa bwasw -t 4 -f out.sai hg19 454reads7.fastq

      Bwa generated the out.sai and then went again to samse to convert this said, as previously did with that of the shortest reads.

      ./bwa samse -f out.sam hg19 input.sai input.fastq

      That is exactly the same I did with the short reads.

      Any suggestion

      Carlos

      Comment


      • #4
        Where's the aln step for the .sai generation before the bwasw command? The fastq must be the same.

        Comment


        • #5
          and it was

          That is that I used.

          ./bwa aln -t 4 -f out.sai hg19 454reads7.fastq


          In fact, I was repeating right now the steps I have the same result

          Copy here the commands.

          Using aln

          cllorens@biotechvana:~/assembling/tools/bwa/bwa-0.5.9> ./bwa aln -t 4 -f destruye.sai hg19 454reads7.fastq
          [bwa_aln] 17bp reads: max_diff = 2
          [bwa_aln] 38bp reads: max_diff = 3
          [bwa_aln] 64bp reads: max_diff = 4
          [bwa_aln] 93bp reads: max_diff = 5
          [bwa_aln] 124bp reads: max_diff = 6
          [bwa_aln] 157bp reads: max_diff = 7
          [bwa_aln] 190bp reads: max_diff = 8
          [bwa_aln] 225bp reads: max_diff = 9
          [bwa_aln_core] calculate SA coordinate... 1048.13 sec
          [bwa_aln_core] write to the disk... 1039.00 sec
          [bwa_aln_core] 218634 sequences have been processed.

          then sai to bam conversion...

          cllorens@biotechvana:~/assembling/tools/bwa/bwa-0.5.9> ./bwa samse -f destruyeme.sam hg19 destruye.sai 454reads7.fastq
          [bwa_aln_core] convert to sequence coordinate... 4.05 sec
          [bwa_aln_core] refine gapped alignments... 17.34 sec
          [bwa_aln_core] print alignments... 1.11 sec
          [bwa_aln_core] 218634 sequences have been processed.


          Now if i use bwasw with the same fastq

          cllorens@biotechvana:~/assembling/tools/bwa/bwa-0.5.9> ./bwa bwasw -t 4 -f destruye2.sai hg19 454reads7.fastq
          [bsw2_aln] read 29176 sequences (10000406 bp)...
          [bsw2_aln] read 28182 sequences (10000061 bp)...
          [bsw2_aln] read 29264 sequences (10000170 bp)...
          [bsw2_aln] read 30374 sequences (10000003 bp)...
          [bsw2_aln] read 31893 sequences (10000054 bp)...
          [bsw2_aln] read 33994 sequences (10000276 bp)...
          [bsw2_aln] read 35751 sequences (9642318 bp)...
          cllorens@biotechvana:~/assembling/tools/bwa/bwa-0.5.9> ls

          and now using the sai generated in this case:


          cllorens@biotechvana:~/assembling/tools/bwa/bwa-0.5.9> ./bwa samse -f destruye2.sam hg19 destruye2.sai 454reads7.fastq
          [bns_restore_core] fail to open file 'hg19.nt.ann'. Abort!
          Aborted

          Any idea?

          Comment


          • #6
            Check you read lengths.

            It's not explaining the error message, but please check Heng Li's (author of BWA) notes here : http://bio-bwa.sourceforge.net/

            Does BWA align 454 reads?
            Yes and no. The BWA-SW component of BWA works well on 454 reads about 200bp or longer. It achieves similar alignment accuracy to SSAHA2 while much faster. BWA-SW also works for shorter reads, but the sensitivity is lower. In addition, BWA-SW does not support paired-end alignment.

            What is maximum query sequence length in alignment?
            It is recommended to only use bwa-short on reads shorter than 200bp.

            Comment


            • #7
              There several sizes Richard including 500 nucleotides or even larger (750).
              Perhaps the problem could be due to the fact that both reads smaller and larger than 200 are collected in the same input file. I think i going to try to separate them in two independent files (short and large than 200) to see what happens. It is just an idea but let me see if there is something new in doing so.

              Carlos

              Comment


              • #8
                Hi
                I did the test to separate reads larger and shortest than 200 nt in two different fastq files and then tried to use bwasw with the fastq with seqs > 200. Again after doing this
                I attempted to switch the format from sai to bam and again bwa aborted the process asking me for the indexfile.nt.ann index file.

                So in my humble opinion this might be a bug in the bwasw algorithm. In fact, while the option aln for short reads gives a message like this at the end of the alignment process

                [bwa_aln_core] calculate SA coordinate... 1048.13 sec
                [bwa_aln_core] write to the disk... 1039.00 sec
                [bwa_aln_core] 218634 sequences have been processed.

                The point is that the option bwasw does not give such an output.

                Comment


                • #9
                  I had that problem and it's solved by this method
                  When making index, use -p and -c
                  e.g. your fasta file: seq.fa
                  your fasta file and bwa program is located in ~/Desktop/BWA
                  make sure you use full path for everything:

                  ~/Desktop/BWA/bwa index -a bwtsw -p ~/Desktop/BWA/seq.fa -c ~/Desktop/BWA/seq.fa
                  Last edited by mitochy; 01-23-2012, 02:38 AM.

                  Comment


                  • #10
                    Hi Mitochy
                    thank you for your commento. Perhaps i am wwrong but I think is not the same problem.

                    -c is for creating color space indexes
                    and certainly the indexfile.nt.ann file is for color space. The point is that I am using here
                    fastq files generated by 454 (i.e. not space colored) and when i try to use the option bwasw for creating the sai file it create it but it fails later with trying to convert from sai to sam. In my last post i wrote from sai to bam but i was talking about sam.

                    Comment


                    • #11
                      Could you check the definition lines in your reference fasta file (i.e. the one that you are aligning your reads to), and remove any descriptions in these lines?

                      E.g. if you have lines that looks like:

                      >contig3223 hg19.ann

                      Change it to:

                      >contig3223

                      I had the same problem and doing so should fix it.

                      Hao

                      Comment


                      • #12
                        Hi Hao

                        The reference is the human genome and the sequences are the distinct chromosome sequences organized in karyotipic format (i.e. 1,2,...22 X,Y,M) and labeled as >chr1... etc only. That is not the problem. Thank you anyway.

                        Comment


                        • #13
                          Same problem

                          Hi cllorens,

                          Were you ever able to resolve this? I am seeing the same behavior with bwasw. I am using simulated 454 reads. The alignment works properly, but the conversion from sai to sam tries to load a colorspace index.

                          Thanks.
                          Last edited by 9taylors; 12-31-2012, 05:43 AM. Reason: removed name

                          Comment


                          • #14
                            Hi,

                            I had the same problem.
                            Reason:
                            The fasta file was indexed with bwa version 0.6.2, while I tried to run aln and sampe with bwa version 0.5.8.

                            After using the same version for both, the problem disappeared.

                            Cheers,

                            David

                            Comment


                            • #15
                              Originally posted by dries View Post
                              Hi,

                              I had the same problem.
                              Reason:
                              The fasta file was indexed with bwa version 0.6.2, while I tried to run aln and sampe with bwa version 0.5.8.

                              After using the same version for both, the problem disappeared.

                              Cheers,

                              David
                              I encountered the same problem and it turned out that there are two versions of bwa on server and I used lower version to generate index.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM
                              • seqadmin
                                The Impact of AI in Genomic Medicine
                                by seqadmin



                                Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                                02-26-2024, 02:07 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-14-2024, 06:13 AM
                              0 responses
                              34 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-08-2024, 08:03 AM
                              0 responses
                              72 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-07-2024, 08:13 AM
                              0 responses
                              81 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-06-2024, 09:51 AM
                              0 responses
                              68 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X