SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
BWA sam and Samtools sam->bam conversion problem maasha Bioinformatics 6 06-05-2013 08:39 AM
.bam to .wig conversion kalidaemon Bioinformatics 7 05-10-2012 03:39 PM
casava 1.8 bam conversion to gatk bam kingsalex Bioinformatics 1 02-14-2012 12:47 PM
Merge sai file of bwa ? louis7781x Bioinformatics 5 12-20-2011 04:00 PM
ANN: New I/O Code in SeqAn (includes BAM/SAM I/O) holtgrewe Bioinformatics 2 09-26-2011 12:14 AM

Reply
 
Thread Tools
Old 01-18-2012, 07:40 AM   #1
cllorens
Member
 
Location: Valencia

Join Date: Nov 2011
Posts: 44
Default bwa sai to bam conversion and indexfile.nt.ann??

Hi
I am trying to test BWA with 454 read data larger than 200 nt using the bwasw option and the hg19 as indexing reference.

BWA generates a ooutpu.sai which i try to convert to sam format and here is the problem.

bwa gives the4 following message

[bns_restore_core] fail to open file 'hg19.nt.ann'. Abort!
Aborted

The point is that I have not idea about what bwa ask me for the file hg19.nt.ann or what is the hg19.nt.ann file. This file is not generated with the other index files when I run the index function, so i am confusing.

I checked the forum about other similar messages and surprinsingly I have found very little (almost nothing clarifying to my doubt) about this.

Can anyone clarify me if this file xxx.nt.ann is normal output of bwa and how I can create it for converting a sai file to bam

Thank you in advance.
cllorens is offline   Reply With Quote
Old 01-18-2012, 07:57 AM   #2
Richard Finney
Senior Member
 
Location: bethesda

Join Date: Feb 2009
Posts: 700
Default

Please provide your commands to indexing, generating sai and generating sam using bwa.
The "nt" means "color indexing". I don't think 454 is color space. You might be using somebody's color space script as a template for your work and you need to modify it.

Last edited by Richard Finney; 01-18-2012 at 07:59 AM.
Richard Finney is offline   Reply With Quote
Old 01-18-2012, 08:39 AM   #3
cllorens
Member
 
Location: Valencia

Join Date: Nov 2011
Posts: 44
Default

Hi Richard thank you for your answer

Effectively 454 is not in color space. Maybe i am doing somthing wrong, I do not know.

I used the extract_sff script to convert sff to fastq and then prinseq to process the fastq

@GCFF90V02JNZWW
CATTTGTTCACTCATAATAAGAAAGTAGGGAGAGGAGAATGTTAACATACCTATAGATAATACATGCACTGTTCCTGCATGT
+GCFF90V02JNZWW
AB===B>>:::<<<=<<311/,,,242,,,/.89<?=889::ADA===AADFDDAAADDD??????ABBBABB==9:::=BB
@GCFF90V02G5MHK
ATATATGCTTTCATGAGAATGAGAGAGTCCTTCGAGCTGTAG
+GCFF90V02G5MHK
IIIIIIIIHHHIIIIIIFFFFFFFFFFFFFFF===@FFFFDD

Then I used BWA for creating the hg19 index using

./bwa index -a bwtsw -p hg19 hg19.fa (so i did not use -c)

for the alignment

I first used ./bwa aln and the bwa worked although only aligned the shrotest reads as it may be expected. Then I converted this sai output to bam and had not problems in doing that.

Next and here comes my troubles, I used bwast for testing bwa with larger reads using the following

./bwa bwasw -t 4 -f out.sai hg19 454reads7.fastq

Bwa generated the out.sai and then went again to samse to convert this said, as previously did with that of the shortest reads.

./bwa samse -f out.sam hg19 input.sai input.fastq

That is exactly the same I did with the short reads.

Any suggestion

Carlos
cllorens is offline   Reply With Quote
Old 01-18-2012, 09:05 AM   #4
Richard Finney
Senior Member
 
Location: bethesda

Join Date: Feb 2009
Posts: 700
Default

Where's the aln step for the .sai generation before the bwasw command? The fastq must be the same.
Richard Finney is offline   Reply With Quote
Old 01-18-2012, 09:20 AM   #5
cllorens
Member
 
Location: Valencia

Join Date: Nov 2011
Posts: 44
Default

and it was

That is that I used.

./bwa aln -t 4 -f out.sai hg19 454reads7.fastq


In fact, I was repeating right now the steps I have the same result

Copy here the commands.

Using aln

cllorens@biotechvana:~/assembling/tools/bwa/bwa-0.5.9> ./bwa aln -t 4 -f destruye.sai hg19 454reads7.fastq
[bwa_aln] 17bp reads: max_diff = 2
[bwa_aln] 38bp reads: max_diff = 3
[bwa_aln] 64bp reads: max_diff = 4
[bwa_aln] 93bp reads: max_diff = 5
[bwa_aln] 124bp reads: max_diff = 6
[bwa_aln] 157bp reads: max_diff = 7
[bwa_aln] 190bp reads: max_diff = 8
[bwa_aln] 225bp reads: max_diff = 9
[bwa_aln_core] calculate SA coordinate... 1048.13 sec
[bwa_aln_core] write to the disk... 1039.00 sec
[bwa_aln_core] 218634 sequences have been processed.

then sai to bam conversion...

cllorens@biotechvana:~/assembling/tools/bwa/bwa-0.5.9> ./bwa samse -f destruyeme.sam hg19 destruye.sai 454reads7.fastq
[bwa_aln_core] convert to sequence coordinate... 4.05 sec
[bwa_aln_core] refine gapped alignments... 17.34 sec
[bwa_aln_core] print alignments... 1.11 sec
[bwa_aln_core] 218634 sequences have been processed.


Now if i use bwasw with the same fastq

cllorens@biotechvana:~/assembling/tools/bwa/bwa-0.5.9> ./bwa bwasw -t 4 -f destruye2.sai hg19 454reads7.fastq
[bsw2_aln] read 29176 sequences (10000406 bp)...
[bsw2_aln] read 28182 sequences (10000061 bp)...
[bsw2_aln] read 29264 sequences (10000170 bp)...
[bsw2_aln] read 30374 sequences (10000003 bp)...
[bsw2_aln] read 31893 sequences (10000054 bp)...
[bsw2_aln] read 33994 sequences (10000276 bp)...
[bsw2_aln] read 35751 sequences (9642318 bp)...
cllorens@biotechvana:~/assembling/tools/bwa/bwa-0.5.9> ls

and now using the sai generated in this case:


cllorens@biotechvana:~/assembling/tools/bwa/bwa-0.5.9> ./bwa samse -f destruye2.sam hg19 destruye2.sai 454reads7.fastq
[bns_restore_core] fail to open file 'hg19.nt.ann'. Abort!
Aborted

Any idea?
cllorens is offline   Reply With Quote
Old 01-18-2012, 09:36 AM   #6
Richard Finney
Senior Member
 
Location: bethesda

Join Date: Feb 2009
Posts: 700
Default

Check you read lengths.

It's not explaining the error message, but please check Heng Li's (author of BWA) notes here : http://bio-bwa.sourceforge.net/

Does BWA align 454 reads?
Yes and no. The BWA-SW component of BWA works well on 454 reads about 200bp or longer. It achieves similar alignment accuracy to SSAHA2 while much faster. BWA-SW also works for shorter reads, but the sensitivity is lower. In addition, BWA-SW does not support paired-end alignment.

What is maximum query sequence length in alignment?
It is recommended to only use bwa-short on reads shorter than 200bp.
Richard Finney is offline   Reply With Quote
Old 01-18-2012, 09:45 AM   #7
cllorens
Member
 
Location: Valencia

Join Date: Nov 2011
Posts: 44
Default

There several sizes Richard including 500 nucleotides or even larger (750).
Perhaps the problem could be due to the fact that both reads smaller and larger than 200 are collected in the same input file. I think i going to try to separate them in two independent files (short and large than 200) to see what happens. It is just an idea but let me see if there is something new in doing so.

Carlos
cllorens is offline   Reply With Quote
Old 01-23-2012, 02:30 AM   #8
cllorens
Member
 
Location: Valencia

Join Date: Nov 2011
Posts: 44
Default

Hi
I did the test to separate reads larger and shortest than 200 nt in two different fastq files and then tried to use bwasw with the fastq with seqs > 200. Again after doing this
I attempted to switch the format from sai to bam and again bwa aborted the process asking me for the indexfile.nt.ann index file.

So in my humble opinion this might be a bug in the bwasw algorithm. In fact, while the option aln for short reads gives a message like this at the end of the alignment process

[bwa_aln_core] calculate SA coordinate... 1048.13 sec
[bwa_aln_core] write to the disk... 1039.00 sec
[bwa_aln_core] 218634 sequences have been processed.

The point is that the option bwasw does not give such an output.
cllorens is offline   Reply With Quote
Old 01-23-2012, 02:35 AM   #9
mitochy
Member
 
Location: one does not simply approximate location

Join Date: Dec 2011
Posts: 10
Default

I had that problem and it's solved by this method
When making index, use -p and -c
e.g. your fasta file: seq.fa
your fasta file and bwa program is located in ~/Desktop/BWA
make sure you use full path for everything:

~/Desktop/BWA/bwa index -a bwtsw -p ~/Desktop/BWA/seq.fa -c ~/Desktop/BWA/seq.fa

Last edited by mitochy; 01-23-2012 at 02:38 AM.
mitochy is offline   Reply With Quote
Old 01-23-2012, 02:54 AM   #10
cllorens
Member
 
Location: Valencia

Join Date: Nov 2011
Posts: 44
Default

Hi Mitochy
thank you for your commento. Perhaps i am wwrong but I think is not the same problem.

-c is for creating color space indexes
and certainly the indexfile.nt.ann file is for color space. The point is that I am using here
fastq files generated by 454 (i.e. not space colored) and when i try to use the option bwasw for creating the sai file it create it but it fails later with trying to convert from sai to sam. In my last post i wrote from sai to bam but i was talking about sam.
cllorens is offline   Reply With Quote
Old 01-26-2012, 10:01 AM   #11
foryvonne
Junior Member
 
Location: US

Join Date: Apr 2011
Posts: 6
Default

Could you check the definition lines in your reference fasta file (i.e. the one that you are aligning your reads to), and remove any descriptions in these lines?

E.g. if you have lines that looks like:

>contig3223 hg19.ann

Change it to:

>contig3223

I had the same problem and doing so should fix it.

Hao
foryvonne is offline   Reply With Quote
Old 01-27-2012, 07:01 AM   #12
cllorens
Member
 
Location: Valencia

Join Date: Nov 2011
Posts: 44
Default

Hi Hao

The reference is the human genome and the sequences are the distinct chromosome sequences organized in karyotipic format (i.e. 1,2,...22 X,Y,M) and labeled as >chr1... etc only. That is not the problem. Thank you anyway.
cllorens is offline   Reply With Quote
Old 12-31-2012, 05:42 AM   #13
9taylors
Junior Member
 
Location: San Francisco

Join Date: Dec 2012
Posts: 1
Default Same problem

Hi cllorens,

Were you ever able to resolve this? I am seeing the same behavior with bwasw. I am using simulated 454 reads. The alignment works properly, but the conversion from sai to sam tries to load a colorspace index.

Thanks.

Last edited by 9taylors; 12-31-2012 at 05:43 AM. Reason: removed name
9taylors is offline   Reply With Quote
Old 04-12-2013, 02:25 AM   #14
dries
Junior Member
 
Location: Bielefeld, Germany

Join Date: Nov 2010
Posts: 3
Default

Hi,

I had the same problem.
Reason:
The fasta file was indexed with bwa version 0.6.2, while I tried to run aln and sampe with bwa version 0.5.8.

After using the same version for both, the problem disappeared.

Cheers,

David
dries is offline   Reply With Quote
Old 05-28-2013, 04:57 PM   #15
finfin
Junior Member
 
Location: Dallas

Join Date: Dec 2011
Posts: 8
Default

Quote:
Originally Posted by dries View Post
Hi,

I had the same problem.
Reason:
The fasta file was indexed with bwa version 0.6.2, while I tried to run aln and sampe with bwa version 0.5.8.

After using the same version for both, the problem disappeared.

Cheers,

David
I encountered the same problem and it turned out that there are two versions of bwa on server and I used lower version to generate index.
finfin is offline   Reply With Quote
Old 05-29-2013, 07:54 AM   #16
Volklor
Member
 
Location: Pacific Northwest

Join Date: Sep 2010
Posts: 13
Default same problem

I'm having the same problem with samse using bwa version 0.5.7:

[bns_restore_core] fail to open file 'FRA_genome/Fvesca.ann'. Abort!

I'm using Illumina reads, so the length should not be an issue. And the .ann file is actually present.

Any ideas?
Volklor is offline   Reply With Quote
Old 05-29-2013, 09:27 AM   #17
ppoudel
Junior Member
 
Location: UK

Join Date: Feb 2012
Posts: 5
Default

Hi,

How does BWA handle the coordinated sorted fastq files (obtained from coordinated sorted bam file)? Do I need to shuffle the bam files before converting them to fastqs?
ppoudel is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:20 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO