SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Filter a SAM file by alignment score (AS) sjackman Bioinformatics 6 03-08-2016 11:44 AM
bowtie priduced SAm file does not have RG tags in alignment neha General 3 03-12-2013 02:47 AM
extract alignment from SAM with a GFF file NicoBxl Bioinformatics 4 08-02-2011 01:45 PM
BWA: specifying SAM/BAM file header fields before read alignment? nora Bioinformatics 3 12-04-2010 09:11 PM
sam output from bwa colorspace alignment Mr Mutundes Bioinformatics 0 12-15-2009 03:02 AM

Reply
 
Thread Tools
Old 04-23-2013, 05:51 AM   #1
jmwhitha
Senior Member
 
Location: NC State, Raleigh, NC

Join Date: Mar 2013
Posts: 105
Default BWA gives no alignment lines in SAM file

Good morning,

I am trying to map simulated reads generated from GemSIM to an index generated from a multi-fasta of the CLJU (bacteria) coding sequence from NCBI. I used these two commands and discovered there were no alignment lines in the SAM file, only header lines. Commands head and tail only returned lines starting with @, and grep -vnm1 '^@' returned nothing.

bwa-0.7.3a/bwa index -a is -p CLJU ./CLJUcoding.fasta

bwa-0.7.3a/bwa mem CLJU CLJUsimreadsCodeTrial.single.fastq > CLJUsimreadsCodeTrial.single.sam

Any suggestions?

Thanks and God bless,
Jason
jmwhitha is offline   Reply With Quote
Old 04-23-2013, 07:02 AM   #2
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 659
Default BWA gives no alignment lines in SAM file

Did you get any error messages or other output when you ran BWA?

I think your syntax is not quite right, according to the bwa manual page
http://bio-bwa.sourceforge.net/bwa.shtml

it should be
bwa-0.7.3a/bwa mem ./CLJUcoding.fasta CLJUsimreadsCodeTrial.single.fastq > CLJUsimreadsCodeTrial.single.sam
mastal is offline   Reply With Quote
Old 04-23-2013, 08:57 AM   #3
jmwhitha
Senior Member
 
Location: NC State, Raleigh, NC

Join Date: Mar 2013
Posts: 105
Default

Hello mastel,

I did not get any error messages. In addition to my CLJUsimreadsCodeTrial_single.sam, I got a CLJU.amb, CLJU.ann, CLJU.bwt, CLJU.pac, and CLJU.sa. My CLJU.amb contains just one problem character according to head command ("3909174 4184 0"). The ann file looks like it contains appropriate text, and the rest of the files are not human readable.

Tried your syntax suggestion and got back: "[E::bwa_idx_load] fail to locate the index files". This message did not occure when I used the syntax in the post above.

Any other suggestions? Thank you very much.
jmwhitha is offline   Reply With Quote
Old 04-23-2013, 11:31 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,495
Default

Quote:
Originally Posted by jmwhitha View Post
Hello mastel,

My CLJU.amb contains just one problem character according to head command ("3909174 4184 0").
What do you mean by "contains one problem character"?

Are there spaces in identifier names in your genome reference file? How large is your reference file? What OS are you running this on?

Can you post a few example lines from your sequence and reference files?
GenoMax is online now   Reply With Quote
Old 04-23-2013, 12:59 PM   #5
jmwhitha
Senior Member
 
Location: NC State, Raleigh, NC

Join Date: Mar 2013
Posts: 105
Default

According to http://seqanswers.com/forums/showthread.php?t=20556, the .amb (ambiguous file) contains illegal characters. xied75 intentionally put an R and an S into his fasta file and got back two problem characters: 249240584 1 R and 249240585 1 S. The numbers I assume are coordinates of some sort, and my illegal character is "0".

Yes there are spaces in the multi-fasta file I am using as a reference. According to "grep -c \ CLJUcoding.fasta" there are "4184".

Genome Reference File
"stat -c %s CLJUcoding.fasta" gives "4609956" (bytes) for the file size.
Grabbing a sample from the head using "head -n30 CLJUcoding.fasta" I get:
>lcl|NC_014328.1_cdsid_YP_003778217.1 [gene=dnaA] [protein=chromosomal replication initiator protein] [protein_id=YP_003778217.1] [location=101..1453]
ATGAATGCCCATCCAAAAGAAATATGGGAACAATCTTTAAACATAATAAATGGTGAAATTACTGAAGTAAGCTTTAACACATGGATTAAAAGTATTACTCCTGTATCTATTGAAAATGACACCTTCATATTAAGTGTACCAAATGACCTTACCAAAGGCATATTAACTAGTAAATATAAAAATTTAATAGCTAATGCTCTAAAATTAATTACTTCAAAAAAATACAACATTAAATTTTTAATTGCCTCTGAATCAGAAGAAGCTTTAACATTAGACAATACTAATAAAAGACACAATAAAAATTCCGTATTGGTAAATGATGAAATGTCAACCATGCTAAATCCAAAATATACTTTTGATTCCTTCGTTATAGGTAATAGTAATAGATTCGCTCATGCAGCTTCACTTGCTGTAGCTGAATCACCTTCAAAAGCATATAATCCCCTATTCATATACGGAGGCGTAGGACTAGGCAAAACTCATTTAATGCATGCTATAGGACACTACATATTAAACAATAATAGTAAATCTAAAGTAGTATACGTTTCATCTGAAAAATTTACAAACGAACTTATAAATTCAATAAAAGATGATAAAAATGTAGAATTCAGAAATAAATATAGAAATATAGATGTACTCTTAATAGATGATATACAATTTATTGCAGGTAAAGAAAGAACCCAAGAGGAATTTTTTCATACCTTTAATGCATTATACGAGGCTAATAAACAAATAATTCTATCTAGTGATAGACCACCAAAAGAAATCCCTACATTAGAAGATAGACTTAGATCTAGGTTTGAATGGGGACTTATAGCAGACATTCAACCACCAGACTTTGAAACTAGAATGGCTATATTAAAAAAGAAGGCAGACGTCGAAAATTTAAATATTCCTAATGAAGTAATGGTGTATATAGCTACTAAAATTAAATCCAACATTAGAGAACTTGAAGGTGCGCTAATAAGAATAGTCGCTTTCTCCTCACTTACAAATAAAGAAATAAGTATAGATTTGGCAGTAGAAGCTTTAAAAGATATAATTTCAAGCAAACAATCAAAACAAGTTACTATAGACTTAATACAAGATGTAGTTGCCAACTATTATAACTTAAAAGTAGATGATTTAAAATCTGCAAGAAGAACAAGAAATGTAGCTTTTCCAAGGCAAATAGCTATGTACTTGTGTAGAAAACTTACAGATATGTCTTTGCCAAAAATCGGAGAAGAATTTGGCGGAAGAGATCATACTACCGTAATACATGCTTATGAAAAAATATCAACTAATTTAAAACAAGATGAAAGTCTTCAAAATGCTATAGGCGATTTAACAAAACGACTAAATCAAAATTAA
>lcl|NC_014328.1_cdsid_YP_003778218.1 [gene=dnaN] [protein=DNA polymerase III subunit beta] [protein_id=YP_003778218.1] [location=1710..2813]
ATGAACTTTATATGTACAAAAACAGAATTACAAGAAGCTATTTCAATAGCACAAAAAGCTATCACAGGGAAATCCAGCATGCCAATATTAAATGGTCTACTTATTACAACCTGTAAAAACCAAATTAAATTAACTGGATCAGATATAGACCTCAGTATAGAAACAAAAATAAATGCAGAAATAAAAGAAGAGGGATCCGTAGTAGTTGATTCTAGACTATTTGGAGAAATTATAAGAAAATTACCTAATGACAATATAAATATTTCTACTACAGAAAATAATTCAATAGAAATAATATGTCAAAAATCTAAATTTAATCTAATTCATATGAATGCAGAAGATTTTCCTGAAATACCTAATATAAATGAAAATATTATTTTCTTAATACCTCAAAAAATATTAAAAGATATGATAAAAAGTACTATTTTTGCTGCAGCTCAAGATGAAACTAGACCTATACTTACAGGAATTTTATTTGAAATCAAGGACAAAAAATTAAATTTAGTAGCATTAGACGGATATAGATTAGCTTTAAAATCAGAATATCTTAATACAGAAAA

And for my fastq sequence (just first 10 lines):
[email protected]:~$ head CLJUsimreadsCodeTrial_single.fastq
@r1_from_gi|300853232|ref|NC_014328.1| Clostridium ljungdah_#0/1
GACTCCTTGCAGCTGGGGAGGTAGAATACCCGATTATTATATTAGTCTAAAAATTGATAATGGTAAGCATTTGTTTTTAATAAATGAATTTCATAATGGG
+
IIIIGDIHIIIIDIIIIHIIIIDIHIIIIEIDFBIIF>HIIEEIIIIIFDEHEGH(IHF#IIHG#@[email protected]<[email protected]>[email protected]>B
@r2_from_gi|300853232|ref|NC_014328.1| Clostridium ljungdah_#0/1
TATGGACTTGGTCTAATTTCATGTTGAACTTTATATTCCAGGACTTATTGGTTTCCACATCATTTCCTATAGTAATTCCGCTGTAATTAATTGCATGTAC
+
[email protected]=FIGIIGIGIBEIGHHHIHIE2GHHGHHHH?HIHGB=GIB?#6H>>[email protected]
@r3_from_gi|300853232|ref|NC_014328.1| Clostridium ljungdah_#0/1
TAATGGAAAAAAACTACTTATAGATTGTGGTGAGGGAACTCAAGTTAGCTTAAAAATACTTGGATGTAAAATAAAAAATATAGATGTAATTTTATTTACA

Sorry, can't help the line wrapping.

What do you think?

Thank you again respondents.
jmwhitha is offline   Reply With Quote
Old 04-23-2013, 01:00 PM   #6
jmwhitha
Senior Member
 
Location: NC State, Raleigh, NC

Join Date: Mar 2013
Posts: 105
Default

Or actually, lack of wrapping.
jmwhitha is offline   Reply With Quote
Old 04-24-2013, 03:27 AM   #7
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,495
Default

I was able to generate a sam file (attached) using the test sequences you had included in previous post.

Is the bwa you are trying to use known to work correctly? Did you download and compile it yourself?
Attached Files
File Type: txt test.sam.txt (814 Bytes, 15 views)
GenoMax is online now   Reply With Quote
Old 04-24-2013, 04:35 AM   #8
jmwhitha
Senior Member
 
Location: NC State, Raleigh, NC

Join Date: Mar 2013
Posts: 105
Default

Oh, great!

Yes, I downloaded and compiled it, but had some issues. See http://seqanswers.com/forums/showthread.php?t=29498

What version of bwa are you using? I am using 0.7.3a

Thank you so much!
jmwhitha is offline   Reply With Quote
Old 04-24-2013, 04:49 AM   #9
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,495
Default

Quote:
Originally Posted by jmwhitha View Post
Oh, great!

Yes, I downloaded and compiled it, but had some issues. See http://seqanswers.com/forums/showthread.php?t=29498

What version of bwa are you using? I am using 0.7.3a

Thank you so much!
I did use 0.7.3a.

I think there is something wrong with your copy of bwa. You are not going to go far until that is fixed.

I assume you are the "sys admin" for this machine (since it appears to be a desktop). What flavor of linux are you running (is it running natively or as a virtual machine)?

I just noticed that there is bwa v. 0.7.4. out. Give that a shot to see if you fare better with the compilation.
GenoMax is online now   Reply With Quote
Old 04-24-2013, 05:07 AM   #10
jmwhitha
Senior Member
 
Location: NC State, Raleigh, NC

Join Date: Mar 2013
Posts: 105
Default

I used the following command to get the newest version of bwa from sourceforge:
wget http://sourceforge.net/projects/bio-...d?source=files -O bwa.tar.bz2

Then I opened the tarball and went into the directory with:
tar -xjf bwa.tar.bz2
cd bwa-0.7.4

This is what happens when I execute "make":

gcc -c -g -Wall -O2 -DHAVE_PTHREAD utils.c -o utils.o
gcc -c -g -Wall -O2 -DHAVE_PTHREAD kstring.c -o kstring.o
gcc -c -g -Wall -O2 -DHAVE_PTHREAD ksw.c -o ksw.o
In file included from ksw.c:28:0:
/usr/lib/gcc/i686-linux-gnu/4.7/include/emmintrin.h:32:3: error: #error "SSE2 instruction set not enabled"
ksw.c:44:2: error: unknown type name ‘__m128i’
ksw.c: In function ‘ksw_qinit’:
ksw.c:67:11: error: ‘__m128i’ undeclared (first use in this function)
ksw.c:67:11: note: each undeclared identifier is reported only once for each function it appears in
ksw.c:67:19: error: expected expression before ‘)’ token
ksw.c: In function ‘ksw_u8’:
ksw.c:110:2: error: unknown type name ‘__m128i’
ksw.c:126:2: warning: implicit declaration of function ‘_mm_set1_epi32’ [-Wimplicit-function-declaration]
ksw.c:127:2: warning: implicit declaration of function ‘_mm_set1_epi8’ [-Wimplicit-function-declaration]
ksw.c:133:3: warning: implicit declaration of function ‘_mm_store_si128’ [-Wimplicit-function-declaration]
ksw.c:140:3: error: unknown type name ‘__m128i’
ksw.c:141:3: warning: implicit declaration of function ‘_mm_load_si128’ [-Wimplicit-function-declaration]
ksw.c:142:3: warning: implicit declaration of function ‘_mm_slli_si128’ [-Wimplicit-function-declaration]
ksw.c:150:4: warning: implicit declaration of function ‘_mm_adds_epu8’ [-Wimplicit-function-declaration]
ksw.c:151:4: warning: implicit declaration of function ‘_mm_subs_epu8’ [-Wimplicit-function-declaration]
ksw.c:153:4: warning: implicit declaration of function ‘_mm_max_epu8’ [-Wimplicit-function-declaration]
ksw.c:177:5: warning: implicit declaration of function ‘_mm_movemask_epi8’ [-Wimplicit-function-declaration]
ksw.c:177:5: warning: implicit declaration of function ‘_mm_cmpeq_epi8’ [-Wimplicit-function-declaration]
ksw.c:183:3: warning: implicit declaration of function ‘_mm_srli_si128’ [-Wimplicit-function-declaration]
ksw.c:183:3: warning: implicit declaration of function ‘_mm_extract_epi16’ [-Wimplicit-function-declaration]
ksw.c: In function ‘ksw_i16’:
ksw.c:228:2: error: unknown type name ‘__m128i’
ksw.c:244:2: warning: implicit declaration of function ‘_mm_set1_epi16’ [-Wimplicit-function-declaration]
ksw.c:256:3: error: unknown type name ‘__m128i’
ksw.c:260:4: warning: implicit declaration of function ‘_mm_adds_epi16’ [-Wimplicit-function-declaration]
ksw.c:262:4: warning: implicit declaration of function ‘_mm_max_epi16’ [-Wimplicit-function-declaration]
ksw.c:266:4: warning: implicit declaration of function ‘_mm_subs_epu16’ [-Wimplicit-function-declaration]
ksw.c:282:5: warning: implicit declaration of function ‘_mm_cmpgt_epi16’ [-Wimplicit-function-declaration]
make: *** [ksw.o] Error 1

Do you know why this is happening?
jmwhitha is offline   Reply With Quote
Old 04-24-2013, 05:10 AM   #11
jmwhitha
Senior Member
 
Location: NC State, Raleigh, NC

Join Date: Mar 2013
Posts: 105
Default

Sorry, I forgot to answer your other questions.

Yes, I am. Desktop. Running natively.
jmwhitha is offline   Reply With Quote
Old 04-24-2013, 05:17 AM   #12
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,495
Default

Quote:
Originally Posted by jmwhitha View Post

Yes, I am. Desktop. Running natively.
Can you post the output of:

Code:
uname -a
I am wondering what linux distribution of a recent vintage does not come with a functional compiler? What exact distro of linux are you using?
GenoMax is online now   Reply With Quote
Old 04-24-2013, 06:17 AM   #13
jmwhitha
Senior Member
 
Location: NC State, Raleigh, NC

Join Date: Mar 2013
Posts: 105
Default

Linux Linux-OptiPlex-755 3.5.0-27-generic #46-Ubuntu SMP Mon Mar 25 20:00:05 UTC 2013 i686 i686 i686 GNU/Linux

I hope that is the case. If it is, what compiler should I get? I have the latest version of protobuf-compiler according to apt-get.
jmwhitha is offline   Reply With Quote
Old 04-24-2013, 06:20 AM   #14
jmwhitha
Senior Member
 
Location: NC State, Raleigh, NC

Join Date: Mar 2013
Posts: 105
Default

I also did the build-essentials.
jmwhitha is offline   Reply With Quote
Old 04-24-2013, 06:54 AM   #15
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,495
Default

I wonder if you have some kind of software conflict with compilers.

Code:
sudo apt-get install gcc
should have been all that was needed
GenoMax is online now   Reply With Quote
Old 04-24-2013, 07:10 AM   #16
jmwhitha
Senior Member
 
Location: NC State, Raleigh, NC

Join Date: Mar 2013
Posts: 105
Default

Okay, so what can I do now? My gcc was already newest version.
jmwhitha is offline   Reply With Quote
Old 04-24-2013, 09:05 AM   #17
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,495
Default

Quote:
Originally Posted by jmwhitha View Post
Okay, so what can I do now? My gcc was already newest version.
This kind of debugging may have to be done sitting in front of the computer. Do you have access to a local expert who can help?

I am surprised that you have run into so many problems with this.
GenoMax is online now   Reply With Quote
Old 06-05-2016, 02:06 PM   #18
jaredroach
Junior Member
 
Location: Seattle

Join Date: May 2016
Posts: 1
Default I suspect the problem is in the read simulator, not bwa

We have had the same problem, getting the same results with bwa when using simulated reads. I suspect the problem you are having is with your read simulator, not bwa. Something in your simulated fastq files is not in the format that bwa expects.

Quote:
Originally Posted by jmwhitha View Post
Good morning,

I am trying to map simulated reads generated from GemSIM to an index generated from a multi-fasta of the CLJU (bacteria) coding sequence from NCBI. I used these two commands and discovered there were no alignment lines in the SAM file, only header lines. Commands head and tail only returned lines starting with @, and grep -vnm1 '^@' returned nothing.

bwa-0.7.3a/bwa index -a is -p CLJU ./CLJUcoding.fasta

bwa-0.7.3a/bwa mem CLJU CLJUsimreadsCodeTrial.single.fastq > CLJUsimreadsCodeTrial.single.sam

Any suggestions?

Thanks and God bless,
Jason
jaredroach is offline   Reply With Quote
Reply

Tags
alignment, bwa, header, mapping, sam

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:27 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO