Seqanswers Leaderboard Ad

**Rocketknight** · 04-05-2012, 07:55 AM

The indexing step should have generated the file human_g1k_v37.fasta.ann instead of human_g1k_v37.ann . Renaming it to human_g1k_v37.fasta.ann (and doing the same for any of the other index files that are mysteriously misnamed) should fix the problem.

**cshowell** · 04-05-2012, 08:40 AM

Thanks Rocketnight, but I still get the error:

[bam_header_read] invalid BAM binary header (this is not a BAM file).
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[bns_restore_core] fail to open file 'human_g1k_v37.fasta.nt.ann'. Abort!

The reads are from an Illumina GAIIx, so everything should be base space, and from what I've read, the '.nt' refers to colour space data.

My alignment command was 'bwa aln -n 3 -o 1 -e -1 -d 16 -l 5 -k 2 - l 14 -t1 -M 3 -O 11 -E 4 -q 0' if that's any help. I'm using bwa/0.6.1.

Any more suggestions would be greatly appreciated.

**Rocketknight** · 04-05-2012, 09:29 AM

Most of those command-line options are unneccessary - you're just setting values to their default values, which doesn't change much. Those options are only there to be tinkered with if you really know what you're doing, so the only thing you really need to run bwa aln is:

$ bwa aln [genome.fasta] [reads.fq] > [output.sai]

Secondly, bwa seems to be freaking out attempting to read a BAM file. Are you supplying reads in FASTQ format [.fastq(.gz) or .fq(.gz)] or .BAM format?

**cshowell** · 04-05-2012, 11:14 AM

The reads going into BWA are in fastq format, converted from the Sequence Read Archive's .sra format using the SRA Toolkit. I was puzzled by the BAM error too, but I can't see anything in what I've done so far that would lead it to expect a BAM file.

**cshowell** · 04-06-2012, 04:13 AM

Bump for any more input on what might be stopping bwa sampe from running. Any ideas?

Why would BWA suddenly expect to see a BAM file when everything was supplied in fastq?

**Rocketknight** · 04-06-2012, 09:54 AM

Can you give me the first few lines of the FASTQ input file? (If it's gzipped, use gunzip -c [file] | head -n 8)

**cshowell** · 04-06-2012, 01:32 PM

Originally posted by Rocketknight View Post

Can you give me the first few lines of the FASTQ input file? (If it's gzipped, use gunzip -c [file] | head -n 8)

The first 8 lines are:

@ERR035484.1 CRIRUN_408:3:9:10903:5069 length=72
CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTACCCCTAACCCTAACCGCACCC
+ERR035484.1 CRIRUN_408:3:9:10903:5069 length=72
;B;CD2@F??=BBCCFEFFFC?E4FF<FBFFG>AFDA9D>F::E@E9B2;>0?###################
@ERR035484.2 CRIRUN_408:3:40:4508:13433 length=72
CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC
+ERR035484.2 CRIRUN_408:3:40:4508:13433 length=72
BB@GHIIHFHIIIIDHIIIIEGIIIICIIIIIGIIIIIFIIIIIHIIIGFEEIIIBEGIEEGEBEAD@>?;A

Thanks for continuing to try to help, Rocketknight, it's appreciated.

**cshowell** · 04-06-2012, 01:44 PM

Also, the first few lines of the fasta reference file are:

>1 dna:chromosome chromosome:GRCh37:1:1:249250621:1
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

(The N's give way to actual sequence soon after this.)

**Rocketknight** · 04-06-2012, 01:47 PM

That looks fine, I can't see BWA mistaking it for a BAM file or anything. BWA is a relatively straightforward program, so I've no idea what could be causing this. All I can think of at this point is a complete clean run with the absolute minimum of command line options, in case some upstream step went wrong somehow.

Remove all the indexes and .sai files you've created. The only files you should have left are your two FASTQ read files and the lone FASTA genome file (they can be gzipped, it makes no difference). Index the genome by cding to the directory it's in and just typing "bwa index [genome file]". Align the two FASTQ files with "bwa aln [genome file] [fastq file] > [output file]", then do "bwa sampe [genome file] [sai file 1] [sai file 2] [fastq file 1] [fastq file 2] > [output.sam]". Don't add any other command line options. Paste the output from each step into a text file and post it up here when you're done (there shouldn't be any sensitive information in it).

With any luck it should either work outright, or at least create an error that'll give me some idea of what's going wrong.

**swbarnes2** · 04-06-2012, 03:36 PM

Is that a space in the chromosome name of the fasta? That might be causing problems.

**cshowell** · 04-10-2012, 03:16 AM

I'm still getting the same problem after re-running the index and align steps. The error message (exit code 134) gave the following as the output:

[bam_header_read] invalid BAM binary header (this is not a BAM file).
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[bns_restore_core] fail to open file 'human_g1k_v37.fasta.nt.ann'. Abort!
/netscr/lsf/killdevil/lsbatch/1334055583.424139: line 8: 15693 Aborted (core dumped) bwa sampe human_g1k_v37.fasta ERR035484_1.sai ERR035484_2.sai ERR035484_1.fastq ERR035484_2.fastq

The alignment files were produced with the proper filenames/extensions (e.g. human_g1k_v37.fasta.ann). It still seems to think it's looking for a BAM file.

Any thoughts? I'm thinking of maybe trying a different reference sequence, but I don't really have a reason to suspect a problem with the one from 1000 Genomes.

swbarnes2 - I only see two spaces in the description line. Which one do you think might cause a problem? I can get rid of it with a Perl script if necessary.

**ulz_peter** · 04-10-2012, 03:41 AM

I was able to reproduce your error: I just put in a wrong prefix index and wrong sai files. It seems bwa uses sam bam_reader functions to read the native .sai format (that's why the error appears twice). So: Do the indexing again using the -p (for Prefix) option and use this prefix (exactly this prefix) for the aln and sampe commands.

And probably there is something wrong with the .sai files (that may be because of the prefix issue). Did you change bwa versions between alignment and sampe step?

Hope that helps,
Peter

**cshowell** · 04-13-2012, 04:01 AM

Ok. I've tried repeating the index and alignment steps and I think Peter is right about there being a problem with the .sai files. The alignment step reports that it runs successfully but, when I check the contents of the .sai files, they seem to contain only details about the lsf job queueing e.g. 'Job <533216> is submitted to queue <day>'. What could be causing this? I've been over everything I can think of, and repeated the steps multiple times (even trying earlier versions of BWA, but not mixing versions). Any more ideas after seeing the .sai output?

**lh3** · 04-13-2012, 04:35 AM

use option -f

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 27 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 26 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Problem running BWA sampe

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News