SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
bwa sampe very slow natpokah Bioinformatics 25 08-13-2013 10:18 AM
bwa sampe hanging krobison Bioinformatics 6 02-13-2013 12:57 PM
BWA sampe problem patel Bioinformatics 9 10-24-2011 05:11 AM
bwa sampe problem sszelinger Bioinformatics 3 06-15-2011 05:34 AM
bwa sampe 0.5.7 error? rcorbett Bioinformatics 2 04-22-2010 07:13 AM

Reply
 
Thread Tools
Old 04-05-2012, 07:37 AM   #1
cshowell
Junior Member
 
Location: North Carolina

Join Date: Apr 2012
Posts: 9
Default Problem running BWA sampe

Maybe someone can see what I'm doing wrong here. I'm using BWA to align a dataset and get a SAM file that I can use for variant calling. I've indexed the reference fasta file with 'bwa index' and run 'bwa aln' to generate my .sai files. These steps seem to have worked, but when I try to use 'bwa sampe' to produce a .sam file, the process aborts and I get the following error message:

'[bns_restore_core] fail to open file 'human_g1k_v37.fasta.ann'. Abort!'

The command I'm using is:

'bwa sampe -a 500 -o 100000 -n 3 -N 10 -P human_g1k_v37.fasta ERR035484_1.sai ERR035484_2.sai ERR035484_1.fastq ERR035484_2.fastq > PEA1.sam'

I have a .ann file ('human_g1k_v37.ann') from the indexing step, and I thought it might be a filename problem, i.e. the program trying to open a .fasta.ann file instead of just .ann, but omitting the .fasta in the command still gave a similar error message.

The first few lines of my .ann file look like this:

3101804739 84 11
0 1 dna:chromosome chromosome:GRCh37:1:1:249250621:1
0 249250621 39
0 2 dna:chromosome chromosome:GRCh37:2:1:243199373:1
249250621 243199373 24
0 3 dna:chromosome chromosome:GRCh37:3:1:198022430:1
492449994 198022430 9
0 4 dna:chromosome chromosome:GRCh37:4:1:191154276:1
690472424 191154276 12
0 5 dna:chromosome chromosome:GRCh37:5:1:180915260:1

I've searched the forum but not found a solution to this that works for me. Can anyone help me out?
cshowell is offline   Reply With Quote
Old 04-05-2012, 07:55 AM   #2
Rocketknight
Member
 
Location: Ireland

Join Date: Sep 2011
Posts: 86
Default

The indexing step should have generated the file human_g1k_v37.fasta.ann instead of human_g1k_v37.ann . Renaming it to human_g1k_v37.fasta.ann (and doing the same for any of the other index files that are mysteriously misnamed) should fix the problem.
Rocketknight is offline   Reply With Quote
Old 04-05-2012, 08:40 AM   #3
cshowell
Junior Member
 
Location: North Carolina

Join Date: Apr 2012
Posts: 9
Default

Thanks Rocketnight, but I still get the error:

[bam_header_read] invalid BAM binary header (this is not a BAM file).
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[bns_restore_core] fail to open file 'human_g1k_v37.fasta.nt.ann'. Abort!

The reads are from an Illumina GAIIx, so everything should be base space, and from what I've read, the '.nt' refers to colour space data.

My alignment command was 'bwa aln -n 3 -o 1 -e -1 -d 16 -l 5 -k 2 - l 14 -t1 -M 3 -O 11 -E 4 -q 0' if that's any help. I'm using bwa/0.6.1.

Any more suggestions would be greatly appreciated.
cshowell is offline   Reply With Quote
Old 04-05-2012, 09:29 AM   #4
Rocketknight
Member
 
Location: Ireland

Join Date: Sep 2011
Posts: 86
Default

Most of those command-line options are unneccessary - you're just setting values to their default values, which doesn't change much. Those options are only there to be tinkered with if you really know what you're doing, so the only thing you really need to run bwa aln is:

$ bwa aln [genome.fasta] [reads.fq] > [output.sai]

Secondly, bwa seems to be freaking out attempting to read a BAM file. Are you supplying reads in FASTQ format [.fastq(.gz) or .fq(.gz)] or .BAM format?
Rocketknight is offline   Reply With Quote
Old 04-05-2012, 11:14 AM   #5
cshowell
Junior Member
 
Location: North Carolina

Join Date: Apr 2012
Posts: 9
Default

The reads going into BWA are in fastq format, converted from the Sequence Read Archive's .sra format using the SRA Toolkit. I was puzzled by the BAM error too, but I can't see anything in what I've done so far that would lead it to expect a BAM file.
cshowell is offline   Reply With Quote
Old 04-06-2012, 04:13 AM   #6
cshowell
Junior Member
 
Location: North Carolina

Join Date: Apr 2012
Posts: 9
Default

Bump for any more input on what might be stopping bwa sampe from running. Any ideas?

Why would BWA suddenly expect to see a BAM file when everything was supplied in fastq?
cshowell is offline   Reply With Quote
Old 04-06-2012, 09:54 AM   #7
Rocketknight
Member
 
Location: Ireland

Join Date: Sep 2011
Posts: 86
Default

Can you give me the first few lines of the FASTQ input file? (If it's gzipped, use gunzip -c [file] | head -n 8)
Rocketknight is offline   Reply With Quote
Old 04-06-2012, 01:32 PM   #8
cshowell
Junior Member
 
Location: North Carolina

Join Date: Apr 2012
Posts: 9
Default

Quote:
Originally Posted by Rocketknight View Post
Can you give me the first few lines of the FASTQ input file? (If it's gzipped, use gunzip -c [file] | head -n 8)
The first 8 lines are:

@ERR035484.1 CRIRUN_408:3:9:10903:5069 length=72
CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTACCCCTAACCCTAACCGCACCC
+ERR035484.1 CRIRUN_408:3:9:10903:5069 length=72
;B;CD2@F??=BBCCFEFFFC?E4FF<FBFFG>AFDA9D>F::E@E9B2;>0?###################
@ERR035484.2 CRIRUN_408:3:40:4508:13433 length=72
CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC
+ERR035484.2 CRIRUN_408:3:40:4508:13433 length=72
BB@GHIIHFHIIIIDHIIIIEGIIIICIIIIIGIIIIIFIIIIIHIIIGFEEIIIBEGIEEGEBEAD@>?;A

Thanks for continuing to try to help, Rocketknight, it's appreciated.
cshowell is offline   Reply With Quote
Old 04-06-2012, 01:44 PM   #9
cshowell
Junior Member
 
Location: North Carolina

Join Date: Apr 2012
Posts: 9
Default

Also, the first few lines of the fasta reference file are:

>1 dna:chromosome chromosome:GRCh37:1:1:249250621:1
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

(The N's give way to actual sequence soon after this.)
cshowell is offline   Reply With Quote
Old 04-06-2012, 01:47 PM   #10
Rocketknight
Member
 
Location: Ireland

Join Date: Sep 2011
Posts: 86
Default

That looks fine, I can't see BWA mistaking it for a BAM file or anything. BWA is a relatively straightforward program, so I've no idea what could be causing this. All I can think of at this point is a complete clean run with the absolute minimum of command line options, in case some upstream step went wrong somehow.

Remove all the indexes and .sai files you've created. The only files you should have left are your two FASTQ read files and the lone FASTA genome file (they can be gzipped, it makes no difference). Index the genome by cding to the directory it's in and just typing "bwa index [genome file]". Align the two FASTQ files with "bwa aln [genome file] [fastq file] > [output file]", then do "bwa sampe [genome file] [sai file 1] [sai file 2] [fastq file 1] [fastq file 2] > [output.sam]". Don't add any other command line options. Paste the output from each step into a text file and post it up here when you're done (there shouldn't be any sensitive information in it).

With any luck it should either work outright, or at least create an error that'll give me some idea of what's going wrong.
Rocketknight is offline   Reply With Quote
Old 04-06-2012, 03:36 PM   #11
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

Is that a space in the chromosome name of the fasta? That might be causing problems.
swbarnes2 is offline   Reply With Quote
Old 04-10-2012, 03:16 AM   #12
cshowell
Junior Member
 
Location: North Carolina

Join Date: Apr 2012
Posts: 9
Default

I'm still getting the same problem after re-running the index and align steps. The error message (exit code 134) gave the following as the output:

[bam_header_read] invalid BAM binary header (this is not a BAM file).
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[bns_restore_core] fail to open file 'human_g1k_v37.fasta.nt.ann'. Abort!
/netscr/lsf/killdevil/lsbatch/1334055583.424139: line 8: 15693 Aborted (core dumped) bwa sampe human_g1k_v37.fasta ERR035484_1.sai ERR035484_2.sai ERR035484_1.fastq ERR035484_2.fastq

The alignment files were produced with the proper filenames/extensions (e.g. human_g1k_v37.fasta.ann). It still seems to think it's looking for a BAM file.

Any thoughts? I'm thinking of maybe trying a different reference sequence, but I don't really have a reason to suspect a problem with the one from 1000 Genomes.

swbarnes2 - I only see two spaces in the description line. Which one do you think might cause a problem? I can get rid of it with a Perl script if necessary.
cshowell is offline   Reply With Quote
Old 04-10-2012, 03:41 AM   #13
ulz_peter
Senior Member
 
Location: Graz, Austria

Join Date: Feb 2010
Posts: 219
Default

I was able to reproduce your error: I just put in a wrong prefix index and wrong sai files. It seems bwa uses sam bam_reader functions to read the native .sai format (that's why the error appears twice). So: Do the indexing again using the -p (for Prefix) option and use this prefix (exactly this prefix) for the aln and sampe commands.

And probably there is something wrong with the .sai files (that may be because of the prefix issue). Did you change bwa versions between alignment and sampe step?

Hope that helps,
Peter
ulz_peter is offline   Reply With Quote
Old 04-13-2012, 04:01 AM   #14
cshowell
Junior Member
 
Location: North Carolina

Join Date: Apr 2012
Posts: 9
Default

Ok. I've tried repeating the index and alignment steps and I think Peter is right about there being a problem with the .sai files. The alignment step reports that it runs successfully but, when I check the contents of the .sai files, they seem to contain only details about the lsf job queueing e.g. 'Job <533216> is submitted to queue <day>'. What could be causing this? I've been over everything I can think of, and repeated the steps multiple times (even trying earlier versions of BWA, but not mixing versions). Any more ideas after seeing the .sai output?
cshowell is offline   Reply With Quote
Old 04-13-2012, 04:35 AM   #15
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

use option -f
lh3 is offline   Reply With Quote
Old 04-13-2012, 05:51 AM   #16
cshowell
Junior Member
 
Location: North Carolina

Join Date: Apr 2012
Posts: 9
Default

Quote:
Originally Posted by lh3 View Post
use option -f
At what step? I don't see an option -f in the BWA manual. What is it?
cshowell is offline   Reply With Quote
Reply

Tags
bwa, error, error message, sampe

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:24 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO