Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BWA questions

    Hi,

    I have few questions about bwa,

    1. Is it necessary to generate a bunch of files when I do the indexing, I know .bwt, may be useful for alignment, but files like .pac are not being used somewhere, right?

    2. for the amb and ann files, I know they represent the n_holes and n_seqs elements, but how to read them.

    3. If I want to compare the results with Maq, which output file contains the error rate, or other stats.

    thanks

  • #2
    I got the sam file by running against human genome, and generate the bam file by samtools, but it display nothing if I use the command of :

    samtools tview aln.bam chr1.fasta

    Is it because the bam file is too large?

    thanks

    Comment


    • #3
      1. all files are necessary.

      3. bwa does alignment but does not give stats. You need to parse the result.

      About samtools, in the manual page, you will find this sentence "Note that if the region showed on the screen contains no mapped reads, a blank screen will be seen." Sorry for the confusion.

      Comment


      • #4
        Hi,
        How could I use option "div" for constructing BWT index? I've installed libdivsufsort, however, when I ran
        bwa index -a div
        I got an error message "[bwt_pac2bwt] libdivsufsort is not compiled in. Abort!"

        thanks

        Comment


        • #5
          Hi, EIMichael

          I never used div option, I guess you should use the right command syntax: bwa index -a div xxx.fasta, right?

          You may need to make sure you have enough memory since Divsufsort lib requires large memory in order to do BWT, and -a div doesn't work for long genomes. On the other side, you probably need to put the divsufsort.h file in the same directory that you do the indexing. This is just a guess, since _DIVBWT should be defined in the source code, however I didn't find out where it was defined, so I have no idea if div really works.

          Comment


          • #6
            totalnew, thanks for your reply!
            Originally posted by totalnew View Post
            I guess you should use the right command syntax: bwa index -a div xxx.fasta, right?
            Yes, sorry, I simply didn't write the command's ending in my previous post.

            You may need to make sure you have enough memory since Divsufsort lib requires large memory in order to do BWT, and -a div doesn't work for long genomes.
            I use 16 GB RAM. Btw, I thought that only -a is has this restriction for long genomes?

            On the other side, you probably need to put the divsufsort.h file in the same directory that you do the indexing.
            I put it, but it didn't help. Anyway, thanks for your advice.

            Comment


            • #7
              Acutally, I have the same question about div, I checked the source code of bwa, -a div doesn't work for long genomes for sure. As I mentioned earlier, in the source code,

              #ifdef _DIVBWT
              divbwt...
              #else
              "libdivsufsort is not compiled in."


              But I am failed to find where _DIVBWT was defined, so I am not sure how div works. If you could figure that out, let me know, thanks.

              Comment


              • #8
                I have a question about mapping quality. Now I have a sam file, the MAPQ field is all 255 (255 applied on the assumption that the alignment is highly accurate). When I calculate MAPQ, should I do it as bwa does:

                23 - (4.343* log(255) +0.5 )

                Or how can I accurately get MAPQ during alignment?

                Comment


                • #9
                  Originally posted by totalnew View Post
                  I have a question about mapping quality. Now I have a sam file, the MAPQ field is all 255 (255 applied on the assumption that the alignment is highly accurate). When I calculate MAPQ, should I do it as bwa does:

                  23 - (4.343* log(255) +0.5 )

                  Or how can I accurately get MAPQ during alignment?
                  The mapping quality should depend on the sensitivity of your aligner settings (tolerant up to how many mismatches?), quality of the reads, etc. MAQ's supplemental materials has a thorough discussion.

                  Comment


                  • #10
                    I have another question about bwa output. I am processing the c-elegans libs. The read length is 42. During the process of generating suffix array, the maximum edit distance should be automatically chosen for different read lengths unless the max_Diff is specified by option -n INT.

                    [bwa_aln] 17bp reads: max_diff = 2
                    [bwa_aln] 38bp reads: max_diff = 3
                    [bwa_aln] 64bp reads: max_diff = 4
                    [bwa_aln] 93bp reads: max_diff = 5
                    [bwa_aln] 124bp reads: max_diff = 6
                    [bwa_aln] 157bp reads: max_diff = 7
                    [bwa_aln] 190bp reads: max_diff = 8
                    [bwa_aln] 225bp reads: max_diff = 9

                    bwa aln will come up with above list, so I assume that 42 reads will have the max_diff = 3 or 4. But the sam file I got has the read with NM:i:5 in TAG field which means edit distance is 5. But the max_diff has been fixed to 3 or 4, why that happened? (This is a PE lib)

                    thanks
                    Last edited by totalnew; 07-09-2009, 03:57 PM.

                    Comment


                    • #11
                      Any answers for above question? Thank you.

                      Comment


                      • #12
                        solid reads support?

                        Is there a plan for solid reads support?

                        Comment


                        • #13
                          Originally posted by ech View Post
                          Is there a plan for solid reads support?
                          SOLiD reads are supported by BWQ as well as aligners including BFAST, Corona-lite, MAQ, and SHRiMP.

                          Comment


                          • #14
                            I happen to have some problems using BWA ... I really need to index short reads (i.e. 16S RNA).

                            I understand i need to install the additional package. So I installed it, but still an error.

                            Code:
                            [root@athos libdivsufsort-2.0.0]# make install
                            [ 44%] Built target divsufsort
                            [ 55%] Built target bwt
                            [ 66%] Built target mksary
                            [ 77%] Built target sasearch
                            [ 88%] Built target suftest
                            [100%] Built target unbwt
                            Install the project...
                            -- Install configuration: "Release"
                            -- Installing /usr/local/lib/pkgconfig/libdivsufsort.pc
                            -- Installing /usr/local/include/divsufsort.h
                            -- Installing /usr/local/lib/libdivsufsort.so.3.0.0
                            [root@athos libdivsufsort-2.0.0]# exit
                            exit
                            [joachim@athos antarctica]$ cd bwa/bwa-0.5.8a
                            [joachim@athos bwa-0.5.8a]$ ./bwa index -a div ../../16S_references.fasta
                            [bwa_index] Pack FASTA... 0.04 sec
                            [bwa_index] Reverse the packed sequence... 0.01 sec
                            [bwa_index] Construct BWT for the packed sequence...
                            [bwt_pac2bwt] libdivsufsort is not compiled in. Abort!
                            Aborted
                            which file needs to be in which directory?

                            Comment


                            • #15
                              just use "-a is"

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin




                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                04-22-2024, 07:01 AM
                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              59 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              57 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              52 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              56 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X