SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
sff files, fasta and fastq Feenix 454 Pyrosequencing 4 06-26-2014 06:43 AM
Fastq to Fasta ardmore Bioinformatics 6 11-17-2011 06:56 AM
Replacing FASTA headers for TopHat & Cufflinks brachysclereid Bioinformatics 2 02-16-2011 05:44 AM
Tophat FASTA or FASTQ formats jamminbeh Bioinformatics 0 12-09-2010 03:15 PM
fastq to fasta conversion kwtennis311 Bioinformatics 4 06-11-2010 12:06 PM

Reply
 
Thread Tools
Old 09-23-2011, 02:19 PM   #1
kursuni
Member
 
Location: Hoboken, NJ

Join Date: May 2011
Posts: 15
Default BWA & FASTQ or FASTA

Hi, I'm very new to Bioinformatics and I have 2 FASTQ files to align. Im trying to align these sequences with BWA. But It doesnt let me do it.
It says:
[bwa_index] fail to open file '/datasets/SRR035022_1.filt.fastq'. Abort!
Aborted

Do I have to use FASTA format with BWA or I have something wrong in elsewhere ?

Thanks in Advance...
kursuni is offline   Reply With Quote
Old 09-23-2011, 04:38 PM   #2
Bukowski
Senior Member
 
Location: UK

Join Date: Jan 2010
Posts: 390
Default

Quote:
Originally Posted by kursuni View Post
Hi, I'm very new to Bioinformatics and I have 2 FASTQ files to align. Im trying to align these sequences with BWA. But It doesnt let me do it.
It says:
[bwa_index] fail to open file '/datasets/SRR035022_1.filt.fastq'. Abort!
Aborted

Do I have to use FASTA format with BWA or I have something wrong in elsewhere ?

Thanks in Advance...
bwa 'index' is the clue. You bwa index the reference genome, then align your fastq files to it.

Your index is a FASTA file.

Your reads are fastq format.

You posted the error, but not the command you ran, so it's hard to tell what your usage was to generate the error. Failing to open the file could also mean you're just not pointing bwa to the right location, but even so it seems like you might have skipped a step ahead in your workflow
Bukowski is offline   Reply With Quote
Old 09-24-2011, 02:15 AM   #3
haojam
Member
 
Location: seoul, korea

Join Date: Jun 2010
Posts: 12
Default

SeqAnswers Group

Mr/Mrs.,
I have installed and make BWA aligner software, but after getting into bwa-0.5.9 folder I get this

COPYING bamlite.c bwa bwase.o bwt_gen bwtaln.o bwtio.c bwtsw2_aux.o bwtsw2_main.o kseq.h main.c solid2fastq.pl utils.o
ChangeLog bamlite.h bwa.1 bwaseqio.c bwt_lite.c bwtgap.c bwtio.o bwtsw2_chain.c cs2nt.c ksort.h main.h stdaln.c
Makefile bamlite.o bwape.c bwaseqio.o bwt_lite.h bwtgap.h bwtmisc.c bwtsw2_chain.o cs2nt.o kstring.c main.o stdaln.h
NEWS bntseq.c bwape.o bwt.c bwt_lite.o bwtgap.o bwtmisc.o bwtsw2_core.c is.c kstring.h qualfa2fq.pl stdaln.o
README bntseq.h bwase.c bwt.h bwtaln.c bwtindex.c bwtsw2.h bwtsw2_core.o is.o kstring.o simple_dp.c utils.c
SRR038263_1.sai bntseq.o bwase.h bwt.o bwtaln.h bwtindex.o bwtsw2_aux.c bwtsw2_main.c khash.h kvec.h simple_dp.o utils.h

How should I start alignment for reference sequence (hg18 fasta format) with SRR038263.fastq . I would be glad for your support.

Regards,
Momo
haojam is offline   Reply With Quote
Old 09-24-2011, 10:05 AM   #4
kursuni
Member
 
Location: Hoboken, NJ

Join Date: May 2011
Posts: 15
Default

Quote:
Originally Posted by Bukowski View Post
bwa 'index' is the clue. You bwa index the reference genome, then align your fastq files to it.

Your index is a FASTA file.

Your reads are fastq format.

You posted the error, but not the command you ran, so it's hard to tell what your usage was to generate the error. Failing to open the file could also mean you're just not pointing bwa to the right location, but even so it seems like you might have skipped a step ahead in your workflow
Thank you for your reply..

Since I'm learning now, I just realized that I need to use FASTA as my reference index file and FASTQ as my read file..

I guess I'm pointing bwa to the right location, and the problem is my knowledge lack on dna sequencing and how to use these programs on linux environment..

However, Is there any source that I can download FASTA file from ?
kursuni is offline   Reply With Quote
Old 09-24-2011, 09:30 PM   #5
haojam
Member
 
Location: seoul, korea

Join Date: Jun 2010
Posts: 12
Default

Hi,

When I run BWA for aligning fasta chr.1 sequence with fastq SRR reads there is an error as below . Could you plz help me out.

Regards,

[bwa-0.5.9]$ ./bwa aln /home/DATA/chr1.fa /home/DATA/SRA/SRA012240/SRX017837/SRR038263_1.fastq > SRR038263_1.sai
[bwa_aln] 17bp reads: max_diff = 2
[bwa_aln] 38bp reads: max_diff = 3
[bwa_aln] 64bp reads: max_diff = 4
[bwa_aln] 93bp reads: max_diff = 5
[bwa_aln] 124bp reads: max_diff = 6
[bwa_aln] 157bp reads: max_diff = 7
[bwa_aln] 190bp reads: max_diff = 8
[bwa_aln] 225bp reads: max_diff = 9
[bwt_restore_bwt] fail to open file '/home/DATA/chr1.fa.bwt'. Abort!
haojam is offline   Reply With Quote
Old 09-25-2011, 09:06 AM   #6
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Index your reference first!
./bwa index -a bwtsw /home/DATA/chr1.fa
nilshomer is offline   Reply With Quote
Old 09-26-2011, 02:53 PM   #7
kursuni
Member
 
Location: Hoboken, NJ

Join Date: May 2011
Posts: 15
Default

I indexed my reference first :

$bwa index -p indexed_chr1 -a is -c ~/fasta/chr1.fa
[bwa_index] Pack nucleotide FASTA... 4.97 sec
[bwa_index] Convert nucleotide PAC to color PAC... 1.48 sec
[bwa_index] Reverse the packed sequence... 1.57 sec
[bwa_index] Construct BWT for the packed sequence...
[bwa_index] 110.54 seconds elapse.
[bwa_index] Construct BWT for the reverse packed sequence...
[bwa_index] 110.08 seconds elapse.
[bwa_index] Update BWT... 1.03 sec
[bwa_index] Update reverse BWT... 1.06 sec
[bwa_index] Construct SA from BWT and Occ... 49.80 sec
[bwa_index] Construct SA from reverse BWT and Occ... 50.04 sec


then I tried to align it using the commands below then it gives error while opening fastq file..

$ bwa aln ~/fasta/chr1.fa ~/datasets/SRR035022_1.filt.fastq > aln_sa.sai
[bwa_aln] 17bp reads: max_diff = 2
[bwa_aln] 38bp reads: max_diff = 3
[bwa_aln] 64bp reads: max_diff = 4
[bwa_aln] 93bp reads: max_diff = 5
[bwa_aln] 124bp reads: max_diff = 6
[bwa_aln] 157bp reads: max_diff = 7
[bwa_aln] 190bp reads: max_diff = 8
[bwa_aln] 225bp reads: max_diff = 9
[bwa_seq_open] fail to open file '/home/ukursuncu/datasets/SRR035022_1.filt.fastq'. Abort!
Aborted

What would I be doing wrong ?

Last edited by kursuni; 09-26-2011 at 02:56 PM.
kursuni is offline   Reply With Quote
Old 09-27-2011, 12:54 AM   #8
sdvie
Member
 
Location: Spain

Join Date: Jul 2010
Posts: 68
Default

kursuni, just a general remark, I would always put the complete path to your reference file, read file and output file, just to make sure, like:

./bwa aln /path/to/folder/with/fasta/chr1.fa /path/to/folder/with/datasets/SRR035022_1.filt.fastq > /path/to/output/aln_sa.sai

If you are not so familiar with the command line yet, you can also look up the file names in your graphical file browser and copy/paste them into your command. That way, usually, the whole path is copied.

cheers,
Sophia
sdvie is offline   Reply With Quote
Old 09-28-2011, 09:20 PM   #9
haojam
Member
 
Location: seoul, korea

Join Date: Jun 2010
Posts: 12
Default

Hello,

I would like to index for aligning FASTQ sequences to FASTA

[haojamrocky@melon bwa-0.5.9]$ ./bwa index -a bwtsw -c /home/haojamrocky/DATA/hg18chr/hg18.fasta
[bwa_index] Pack nucleotide FASTA... [bns_fasta2bntseq] zero length sequence. Abort!
haojam is offline   Reply With Quote
Old 09-28-2011, 09:24 PM   #10
haojam
Member
 
Location: seoul, korea

Join Date: Jun 2010
Posts: 12
Default

Hello,

I would like to index for aligning FASTQ sequences to FASTA hg18 reference sequence. The FASTQ sample sequences are SOLID reads. Could you please assist me. I hereby attach the error message while running on BWA for indexing the reference hg18.fasta .

[haojamrocky@melon bwa-0.5.9]$ ./bwa index -a bwtsw -c /home/haojamrocky/DATA/hg18chr/hg18.fasta
[bwa_index] Pack nucleotide FASTA... [bns_fasta2bntseq] zero length sequence. Abort!

Regards,
HR
haojam is offline   Reply With Quote
Old 09-28-2011, 10:16 PM   #11
haojam
Member
 
Location: seoul, korea

Join Date: Jun 2010
Posts: 12
Default

Hello,

Before indexing hg18.fa , I put all chr1 to chrY including chrM to hg18.fa using this code mention below. When I tried for single chr1 fasta for indexing it runs properly. Is this error due to this code.

To cat all sequence together into one single fasta record:
$ cat chr*.fa | sed -e "/^>/d" >> hg18.fa

Regards,
HR
haojam is offline   Reply With Quote
Old 09-29-2011, 12:26 AM   #12
sdvie
Member
 
Location: Spain

Join Date: Jul 2010
Posts: 68
Default

Quote:
Originally Posted by haojam View Post
Hello,

To cat all sequence together into one single fasta record:
$ cat chr*.fa | sed -e "/^>/d" >> hg18.fa

Regards,
HR
Did you check how the hg18.fa looks like after this and what size it has?
sdvie is offline   Reply With Quote
Old 09-30-2011, 02:43 AM   #13
haojam
Member
 
Location: seoul, korea

Join Date: Jun 2010
Posts: 12
Default

Hello,

Does BWA support SRR.....fastq.bz2 file for aligning with the human genome reference sequence?

Regards,
HR
haojam is offline   Reply With Quote
Old 09-30-2011, 02:57 AM   #14
ulz_peter
Senior Member
 
Location: Graz, Austria

Join Date: Feb 2010
Posts: 219
Default

Hi kursuni,

you may have a look here:
http://seqanswers.com/forums/showthread.php?t=14038

that might help you
ulz_peter is offline   Reply With Quote
Old 10-03-2011, 12:21 PM   #15
kursuni
Member
 
Location: Hoboken, NJ

Join Date: May 2011
Posts: 15
Default

Quote:
Originally Posted by ulz_peter View Post
Hi kursuni,

you may have a look here:
http://seqanswers.com/forums/showthread.php?t=14038

that might help you
Dear ulz_peter, It was very helpful.. Thank you very much...
Since I'm very new to bioinformatics, I really need so much help and am trying to learn through books, papers and internet resources such as this forum. But it takes time to learn bioinformatics anyway.. However, since I'm working on my thesis about bioinformatics, I need to do this as fast as I can due to the time limitation.. Therefore, I really appreciate any help on this..
If you may suggest any other papers or document, I would appreciate it as well..

Thanks again.
Best Regards..
Ugur
kursuni is offline   Reply With Quote
Old 12-22-2011, 11:46 AM   #16
niti217
Member
 
Location: USA

Join Date: Dec 2011
Posts: 10
Default

I am having similar problem - past 5 hours i spent on debugging error -but in vain.
I would really appreciate any help in this regard.

I am trying to index Homo_sapiens.GRCh37 ...fa file using the command

bwa index -a bwtsw /directory/filename.fa

but it keeps giving me the following error

[bwa_index] Pack FASTA... 56.76 sec
[bwa_index] Reverse the packed sequence... Segmentation fault

Can someone please help me with possible suggestion to fix this. Thank.
niti217 is offline   Reply With Quote
Old 04-06-2013, 12:25 AM   #17
slengyel
Junior Member
 
Location: Philadelphia, Pa

Join Date: Dec 2012
Posts: 7
Default BWA aln error: can't locate index

Greetings,

I'm trying to align my raw paired end illumina reads to my best abyss contigs.fa.

The commands I used to index and align are as follows:

/home/stephen/Programs/BWA/bwa-0.7.3a/bwa index -p contigs.fa -a bwtsw /DATA/ANALYSIS/stephen/k62/contigs.fa

/home/stephen/Programs/BWA/bwa-0.7.3a/bwa aln /DATA/ANALYSIS/stephen/k62/contigs.fa /DATA/RAW_DATA/$1.read1.gz -t 4 >/DATA/ANALYSIS/stephen/$1.read1.sai

I repeat the second command for the read2.gz file.

The indexing appears to go smoothly, i.e. the proper outputs are there. However, when I run the bwa aln command, the following occurs:

[bwa_aln] 17bp reads: max_diff = 2
[bwa_aln] 38bp reads: max_diff = 3
[bwa_aln] 64bp reads: max_diff = 4
[bwa_aln] 93bp reads: max_diff = 5
[bwa_aln] 124bp reads: max_diff = 6
[bwa_aln] 157bp reads: max_diff = 7
[bwa_aln] 190bp reads: max_diff = 8
[bwa_aln] 225bp reads: max_diff = 9
[bwa_aln] fail to locate the index
[main] Version: 0.7.3a-r367

I'm running the commands in the same directory as the index outputs. Why would bwa not be able to find the indices? There seems to be no parameter to tell bwa aln where the index files are located.

The end result is to obtain the metric outputs from CollectInsertSizeMetrics after further picard tools conversions. This is in order to verify insert size and standard deviation values required for input files for ALL-PATHS-LG.

All help is appreciated, and thanks in advance.
slengyel is offline   Reply With Quote
Old 04-06-2013, 03:44 AM   #18
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default BWA & FASTQ or FASTA

Hi,

you don't say how long your Illumina reads are.

bwa aln (BWA-backtrack) only works for reads up to 100 bp, so this could be the problem.
mastal is offline   Reply With Quote
Old 04-06-2013, 09:34 AM   #19
slengyel
Junior Member
 
Location: Philadelphia, Pa

Join Date: Dec 2012
Posts: 7
Default

this particular data set has a read length of 90..two others I was going to attempt later have read lengths of 140.
slengyel is offline   Reply With Quote
Old 04-06-2013, 11:06 AM   #20
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default BWA & FASTQ or FASTA

OK, so it looks like the read length shouldn't be the problem.

Is the bwa index in the same directory as the contigs.fa contigs file? That may be why you get the 'failed to locate the index' error message.
mastal is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:14 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO