Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BWA gives no alignment lines in SAM file

    Good morning,

    I am trying to map simulated reads generated from GemSIM to an index generated from a multi-fasta of the CLJU (bacteria) coding sequence from NCBI. I used these two commands and discovered there were no alignment lines in the SAM file, only header lines. Commands head and tail only returned lines starting with @, and grep -vnm1 '^@' returned nothing.

    bwa-0.7.3a/bwa index -a is -p CLJU ./CLJUcoding.fasta

    bwa-0.7.3a/bwa mem CLJU CLJUsimreadsCodeTrial.single.fastq > CLJUsimreadsCodeTrial.single.sam

    Any suggestions?

    Thanks and God bless,
    Jason

  • #2
    BWA gives no alignment lines in SAM file

    Did you get any error messages or other output when you ran BWA?

    I think your syntax is not quite right, according to the bwa manual page


    it should be
    bwa-0.7.3a/bwa mem ./CLJUcoding.fasta CLJUsimreadsCodeTrial.single.fastq > CLJUsimreadsCodeTrial.single.sam

    Comment


    • #3
      Hello mastel,

      I did not get any error messages. In addition to my CLJUsimreadsCodeTrial_single.sam, I got a CLJU.amb, CLJU.ann, CLJU.bwt, CLJU.pac, and CLJU.sa. My CLJU.amb contains just one problem character according to head command ("3909174 4184 0"). The ann file looks like it contains appropriate text, and the rest of the files are not human readable.

      Tried your syntax suggestion and got back: "[E::bwa_idx_load] fail to locate the index files". This message did not occure when I used the syntax in the post above.

      Any other suggestions? Thank you very much.

      Comment


      • #4
        Originally posted by jmwhitha View Post
        Hello mastel,

        My CLJU.amb contains just one problem character according to head command ("3909174 4184 0").
        What do you mean by "contains one problem character"?

        Are there spaces in identifier names in your genome reference file? How large is your reference file? What OS are you running this on?

        Can you post a few example lines from your sequence and reference files?

        Comment


        • #5
          According to http://seqanswers.com/forums/showthread.php?t=20556, the .amb (ambiguous file) contains illegal characters. xied75 intentionally put an R and an S into his fasta file and got back two problem characters: 249240584 1 R and 249240585 1 S. The numbers I assume are coordinates of some sort, and my illegal character is "0".

          Yes there are spaces in the multi-fasta file I am using as a reference. According to "grep -c \ CLJUcoding.fasta" there are "4184".

          Genome Reference File
          "stat -c %s CLJUcoding.fasta" gives "4609956" (bytes) for the file size.
          Grabbing a sample from the head using "head -n30 CLJUcoding.fasta" I get:
          >lcl|NC_014328.1_cdsid_YP_003778217.1 [gene=dnaA] [protein=chromosomal replication initiator protein] [protein_id=YP_003778217.1] [location=101..1453]
          ATGAATGCCCATCCAAAAGAAATATGGGAACAATCTTTAAACATAATAAATGGTGAAATTACTGAAGTAAGCTTTAACACATGGATTAAAAGTATTACTCCTGTATCTATTGAAAATGACACCTTCATATTAAGTGTACCAAATGACCTTACCAAAGGCATATTAACTAGTAAATATAAAAATTTAATAGCTAATGCTCTAAAATTAATTACTTCAAAAAAATACAACATTAAATTTTTAATTGCCTCTGAATCAGAAGAAGCTTTAACATTAGACAATACTAATAAAAGACACAATAAAAATTCCGTATTGGTAAATGATGAAATGTCAACCATGCTAAATCCAAAATATACTTTTGATTCCTTCGTTATAGGTAATAGTAATAGATTCGCTCATGCAGCTTCACTTGCTGTAGCTGAATCACCTTCAAAAGCATATAATCCCCTATTCATATACGGAGGCGTAGGACTAGGCAAAACTCATTTAATGCATGCTATAGGACACTACATATTAAACAATAATAGTAAATCTAAAGTAGTATACGTTTCATCTGAAAAATTTACAAACGAACTTATAAATTCAATAAAAGATGATAAAAATGTAGAATTCAGAAATAAATATAGAAATATAGATGTACTCTTAATAGATGATATACAATTTATTGCAGGTAAAGAAAGAACCCAAGAGGAATTTTTTCATACCTTTAATGCATTATACGAGGCTAATAAACAAATAATTCTATCTAGTGATAGACCACCAAAAGAAATCCCTACATTAGAAGATAGACTTAGATCTAGGTTTGAATGGGGACTTATAGCAGACATTCAACCACCAGACTTTGAAACTAGAATGGCTATATTAAAAAAGAAGGCAGACGTCGAAAATTTAAATATTCCTAATGAAGTAATGGTGTATATAGCTACTAAAATTAAATCCAACATTAGAGAACTTGAAGGTGCGCTAATAAGAATAGTCGCTTTCTCCTCACTTACAAATAAAGAAATAAGTATAGATTTGGCAGTAGAAGCTTTAAAAGATATAATTTCAAGCAAACAATCAAAACAAGTTACTATAGACTTAATACAAGATGTAGTTGCCAACTATTATAACTTAAAAGTAGATGATTTAAAATCTGCAAGAAGAACAAGAAATGTAGCTTTTCCAAGGCAAATAGCTATGTACTTGTGTAGAAAACTTACAGATATGTCTTTGCCAAAAATCGGAGAAGAATTTGGCGGAAGAGATCATACTACCGTAATACATGCTTATGAAAAAATATCAACTAATTTAAAACAAGATGAAAGTCTTCAAAATGCTATAGGCGATTTAACAAAACGACTAAATCAAAATTAA
          >lcl|NC_014328.1_cdsid_YP_003778218.1 [gene=dnaN] [protein=DNA polymerase III subunit beta] [protein_id=YP_003778218.1] [location=1710..2813]
          ATGAACTTTATATGTACAAAAACAGAATTACAAGAAGCTATTTCAATAGCACAAAAAGCTATCACAGGGAAATCCAGCATGCCAATATTAAATGGTCTACTTATTACAACCTGTAAAAACCAAATTAAATTAACTGGATCAGATATAGACCTCAGTATAGAAACAAAAATAAATGCAGAAATAAAAGAAGAGGGATCCGTAGTAGTTGATTCTAGACTATTTGGAGAAATTATAAGAAAATTACCTAATGACAATATAAATATTTCTACTACAGAAAATAATTCAATAGAAATAATATGTCAAAAATCTAAATTTAATCTAATTCATATGAATGCAGAAGATTTTCCTGAAATACCTAATATAAATGAAAATATTATTTTCTTAATACCTCAAAAAATATTAAAAGATATGATAAAAAGTACTATTTTTGCTGCAGCTCAAGATGAAACTAGACCTATACTTACAGGAATTTTATTTGAAATCAAGGACAAAAAATTAAATTTAGTAGCATTAGACGGATATAGATTAGCTTTAAAATCAGAATATCTTAATACAGAAAA

          And for my fastq sequence (just first 10 lines):
          jmwhitha@Linux-OptiPlex-755:~$ head CLJUsimreadsCodeTrial_single.fastq
          @r1_from_gi|300853232|ref|NC_014328.1| Clostridium ljungdah_#0/1
          GACTCCTTGCAGCTGGGGAGGTAGAATACCCGATTATTATATTAGTCTAAAAATTGATAATGGTAAGCATTTGTTTTTAATAAATGAATTTCATAATGGG
          +
          IIIIGDIHIIIIDIIIIHIIIIDIHIIIIEIDFBIIF>HIIEEIIIIIFDEHEGH(IHF#IIHG#@HDGG@BGGBHGHBGD<HBBHIH@D>=BBGG@E>B
          @r2_from_gi|300853232|ref|NC_014328.1| Clostridium ljungdah_#0/1
          TATGGACTTGGTCTAATTTCATGTTGAACTTTATATTCCAGGACTTATTGGTTTCCACATCATTTCCTATAGTAATTCCGCTGTAATTAATTGCATGTAC
          +
          IDIIIIIIGIHHIIGIGIIIGIIIIIGIIIGDIIIHII@IIIICIIIGI=FIGIIGIGIBEIGHHHIHIE2GHHGHHHH?HIHGB=GIB?#6H>>EG@EC
          @r3_from_gi|300853232|ref|NC_014328.1| Clostridium ljungdah_#0/1
          TAATGGAAAAAAACTACTTATAGATTGTGGTGAGGGAACTCAAGTTAGCTTAAAAATACTTGGATGTAAAATAAAAAATATAGATGTAATTTTATTTACA

          Sorry, can't help the line wrapping.

          What do you think?

          Thank you again respondents.

          Comment


          • #6
            Or actually, lack of wrapping.

            Comment


            • #7
              I was able to generate a sam file (attached) using the test sequences you had included in previous post.

              Is the bwa you are trying to use known to work correctly? Did you download and compile it yourself?
              Attached Files

              Comment


              • #8
                Oh, great!

                Yes, I downloaded and compiled it, but had some issues. See http://seqanswers.com/forums/showthread.php?t=29498

                What version of bwa are you using? I am using 0.7.3a

                Thank you so much!

                Comment


                • #9
                  Originally posted by jmwhitha View Post
                  Oh, great!

                  Yes, I downloaded and compiled it, but had some issues. See http://seqanswers.com/forums/showthread.php?t=29498

                  What version of bwa are you using? I am using 0.7.3a

                  Thank you so much!
                  I did use 0.7.3a.

                  I think there is something wrong with your copy of bwa. You are not going to go far until that is fixed.

                  I assume you are the "sys admin" for this machine (since it appears to be a desktop). What flavor of linux are you running (is it running natively or as a virtual machine)?

                  I just noticed that there is bwa v. 0.7.4. out. Give that a shot to see if you fare better with the compilation.

                  Comment


                  • #10
                    I used the following command to get the newest version of bwa from sourceforge:
                    wget http://sourceforge.net/projects/bio-...d?source=files -O bwa.tar.bz2

                    Then I opened the tarball and went into the directory with:
                    tar -xjf bwa.tar.bz2
                    cd bwa-0.7.4

                    This is what happens when I execute "make":

                    gcc -c -g -Wall -O2 -DHAVE_PTHREAD utils.c -o utils.o
                    gcc -c -g -Wall -O2 -DHAVE_PTHREAD kstring.c -o kstring.o
                    gcc -c -g -Wall -O2 -DHAVE_PTHREAD ksw.c -o ksw.o
                    In file included from ksw.c:28:0:
                    /usr/lib/gcc/i686-linux-gnu/4.7/include/emmintrin.h:32:3: error: #error "SSE2 instruction set not enabled"
                    ksw.c:44:2: error: unknown type name ‘__m128i’
                    ksw.c: In function ‘ksw_qinit’:
                    ksw.c:67:11: error: ‘__m128i’ undeclared (first use in this function)
                    ksw.c:67:11: note: each undeclared identifier is reported only once for each function it appears in
                    ksw.c:67:19: error: expected expression before ‘)’ token
                    ksw.c: In function ‘ksw_u8’:
                    ksw.c:110:2: error: unknown type name ‘__m128i’
                    ksw.c:126:2: warning: implicit declaration of function ‘_mm_set1_epi32’ [-Wimplicit-function-declaration]
                    ksw.c:127:2: warning: implicit declaration of function ‘_mm_set1_epi8’ [-Wimplicit-function-declaration]
                    ksw.c:133:3: warning: implicit declaration of function ‘_mm_store_si128’ [-Wimplicit-function-declaration]
                    ksw.c:140:3: error: unknown type name ‘__m128i’
                    ksw.c:141:3: warning: implicit declaration of function ‘_mm_load_si128’ [-Wimplicit-function-declaration]
                    ksw.c:142:3: warning: implicit declaration of function ‘_mm_slli_si128’ [-Wimplicit-function-declaration]
                    ksw.c:150:4: warning: implicit declaration of function ‘_mm_adds_epu8’ [-Wimplicit-function-declaration]
                    ksw.c:151:4: warning: implicit declaration of function ‘_mm_subs_epu8’ [-Wimplicit-function-declaration]
                    ksw.c:153:4: warning: implicit declaration of function ‘_mm_max_epu8’ [-Wimplicit-function-declaration]
                    ksw.c:177:5: warning: implicit declaration of function ‘_mm_movemask_epi8’ [-Wimplicit-function-declaration]
                    ksw.c:177:5: warning: implicit declaration of function ‘_mm_cmpeq_epi8’ [-Wimplicit-function-declaration]
                    ksw.c:183:3: warning: implicit declaration of function ‘_mm_srli_si128’ [-Wimplicit-function-declaration]
                    ksw.c:183:3: warning: implicit declaration of function ‘_mm_extract_epi16’ [-Wimplicit-function-declaration]
                    ksw.c: In function ‘ksw_i16’:
                    ksw.c:228:2: error: unknown type name ‘__m128i’
                    ksw.c:244:2: warning: implicit declaration of function ‘_mm_set1_epi16’ [-Wimplicit-function-declaration]
                    ksw.c:256:3: error: unknown type name ‘__m128i’
                    ksw.c:260:4: warning: implicit declaration of function ‘_mm_adds_epi16’ [-Wimplicit-function-declaration]
                    ksw.c:262:4: warning: implicit declaration of function ‘_mm_max_epi16’ [-Wimplicit-function-declaration]
                    ksw.c:266:4: warning: implicit declaration of function ‘_mm_subs_epu16’ [-Wimplicit-function-declaration]
                    ksw.c:282:5: warning: implicit declaration of function ‘_mm_cmpgt_epi16’ [-Wimplicit-function-declaration]
                    make: *** [ksw.o] Error 1

                    Do you know why this is happening?

                    Comment


                    • #11
                      Sorry, I forgot to answer your other questions.

                      Yes, I am. Desktop. Running natively.

                      Comment


                      • #12
                        Originally posted by jmwhitha View Post

                        Yes, I am. Desktop. Running natively.
                        Can you post the output of:

                        Code:
                        uname -a
                        I am wondering what linux distribution of a recent vintage does not come with a functional compiler? What exact distro of linux are you using?

                        Comment


                        • #13
                          Linux Linux-OptiPlex-755 3.5.0-27-generic #46-Ubuntu SMP Mon Mar 25 20:00:05 UTC 2013 i686 i686 i686 GNU/Linux

                          I hope that is the case. If it is, what compiler should I get? I have the latest version of protobuf-compiler according to apt-get.

                          Comment


                          • #14
                            I also did the build-essentials.

                            Comment


                            • #15
                              I wonder if you have some kind of software conflict with compilers.

                              Code:
                              sudo apt-get install gcc
                              should have been all that was needed

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM
                              • seqadmin
                                The Impact of AI in Genomic Medicine
                                by seqadmin



                                Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                                02-26-2024, 02:07 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-14-2024, 06:13 AM
                              0 responses
                              34 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-08-2024, 08:03 AM
                              0 responses
                              72 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-07-2024, 08:13 AM
                              0 responses
                              81 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-06-2024, 09:51 AM
                              0 responses
                              68 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X