SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
bwa sampe hanging krobison Bioinformatics 6 02-13-2013 01:57 PM
BWA No BAM Input Option? jkozubek Bioinformatics 1 11-16-2011 11:46 AM
BWA sampe problem patel Bioinformatics 9 10-24-2011 06:11 AM
Can I randomly sampe from ChIP seq input skying Bioinformatics 0 07-21-2010 02:19 PM
bwa sampe 0.5.7 error? rcorbett Bioinformatics 2 04-22-2010 08:13 AM

Reply
 
Thread Tools
Old 04-21-2011, 06:19 AM   #1
avinash
Member
 
Location: New York

Join Date: Oct 2009
Posts: 10
Default bwa sampe bam input

I have paired-end data in two separate fastq files. I started out with converting and merging the two fastq files into a single unaligned bam using picard FastqToSam (using the FASTQ and FASTQ2 options). I used this bam file with bwa aln and successfully generated the sai file.
Now, for the sampe step I am supposed to provide the sai file and the original sequence file. Since bwa aln is able to accept bam files as input there should be a way to provide the same bam file even for the sampe step.
Any ideas?
avinash is offline   Reply With Quote
Old 10-25-2011, 06:23 AM   #2
lletourn
Member
 
Location: Montreal

Join Date: Oct 2009
Posts: 63
Default

according to the bwa manual page:
http://bio-bwa.sourceforge.net/bwa.shtml

-b Specify the input read sequence file is the BAM format. For paired-end data, two ends in a pair must be grouped together and options -1 or -2 are usually applied to specify which end should be mapped. Typical command lines for mapping pair-end data in the BAM format are:

bwa aln ref.fa -b1 reads.bam > 1.sai
bwa aln ref.fa -b2 reads.bam > 2.sai
bwa sampe ref.fa 1.sai 2.sai reads.bam reads.bam > aln.sam
lletourn is offline   Reply With Quote
Old 10-25-2011, 10:16 AM   #3
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

Merging the fastqs wasn't right for use with bwa, and converting them to .bams is unecessary. You need to align each one separately to the reference fasta.

bwa aln ref.fa fastq1.fq > 1.sai
bwa aln ref.fa fastq2.fq > 2.sai
bwa sampe ref.fa 1.sai 2.sai fastq1.fq fastq2.fq > aln.sam

Here's the tricky part. Yes, you do want .bams in the end. But bams is a very flexible format. A .bam ends up containing all the orignal information of the fastq, and the information about where it aligned. When you converted the fastq to bam, you just made a bam file that contained no information about where the reads mapped, because you never mapped them.

Run the bwa aln program to align the reads to the reference. Those reads can be in bam format or fastq format. But you definately need read1 and read2 in separate files. Then sampe will take them both, and know that they are in pairs, and make the .bam with all the alignment information and will work out pair insert sizes.
swbarnes2 is offline   Reply With Quote
Old 10-25-2011, 10:29 AM   #4
avinash
Member
 
Location: New York

Join Date: Oct 2009
Posts: 10
Default

Thank you very much lletourn and swbarnes2 for your answers.
When I had asked this question half a year ago I was trying to play with all the possibilities with bwa. But I soon realized, as swbarnes2 suggested, that use of unaligned bam does not provide any real advantage as far as the alignment is concerned. So I had dropped that idea.
However, I want to thank you both again for the inputs.
avinash is offline   Reply With Quote
Old 10-25-2011, 10:35 AM   #5
avinash
Member
 
Location: New York

Join Date: Oct 2009
Posts: 10
Default Re:

DELETED - duplicate post

Last edited by avinash; 10-25-2011 at 10:52 AM. Reason: duplicate post
avinash is offline   Reply With Quote
Old 10-25-2011, 11:06 AM   #6
lletourn
Member
 
Location: Montreal

Join Date: Oct 2009
Posts: 63
Default

I read the first post too fast. At our center we use unaligned bams as inputs to bwa using the technique I mentioned above. This works with paired and single datasets.

What I propose is fine for this, but to build paired bams you can just append fastqs as swbarnes2 mentions.

sorry for the confusion.
lletourn is offline   Reply With Quote
Old 07-09-2012, 10:32 AM   #7
Michael.James.Clark
Senior Member
 
Location: Palo Alto

Join Date: Apr 2009
Posts: 213
Default

Related to the topic of this thread:

If we are re-aligning paired data from a previously aligned bam file (because I had a bam file returned to me from a vendor rather than fastq files), what is the syntax for the sampe step? Do we just list the BAM file twice? Example:

bwa sampe -r "@RG\tID:bwa\tLB:bwa\tSM:bwa\tPL:ILLUMINA" human_g1k_v37_decoy bwa.1.sai bwa.2.sai in.bam in.bam > out.bwa.sam

I started a run that way just now and it appears to be working but I wanted to make sure this is the way to do it if anyone can confirm. Thanks!
__________________
Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
Projects: U87MG whole genome sequence [Website] [Paper]
Michael.James.Clark is offline   Reply With Quote
Old 09-18-2012, 12:15 AM   #8
Thorondor
Member
 
Location: Heidelberg

Join Date: Feb 2011
Posts: 69
Default

one question, if I have read pairs does it for sampe if I switch read pairs for the input?

like:
bwa sampe ref.fa 2.sai 1.sai read2.fq read1.fq > aln.sam

instead of:
bwa sampe ref.fa 1.sai 2.sai read1.fq read2.fq > aln.sam

or does it not matter for the sampe step which read was sequenced first. I think the direction of the sequenced read should matter, or not?
seq:___-----------------------------------
r1:____------------>__is___<---------- r2

maybe I just try it and check if the output changes. ;-)

Last edited by Thorondor; 09-18-2012 at 01:20 AM.
Thorondor is offline   Reply With Quote
Old 09-18-2012, 09:21 AM   #9
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

Quote:
Originally Posted by Michael.James.Clark View Post
Related to the topic of this thread:

If we are re-aligning paired data from a previously aligned bam file (because I had a bam file returned to me from a vendor rather than fastq files), what is the syntax for the sampe step? Do we just list the BAM file twice? Example:

bwa sampe -r "@RG\tID:bwa\tLB:bwa\tSM:bwa\tPL:ILLUMINA" human_g1k_v37_decoy bwa.1.sai bwa.2.sai in.bam in.bam > out.bwa.sam

I started a run that way just now and it appears to be working but I wanted to make sure this is the way to do it if anyone can confirm. Thanks!
If it's really working, then that works, but I would be skeptical that it's really treating that one file listed twice as if it has pairs of paired reads in it. I'd check on that before I concluded that it was working. It's probably figuring that the first read is paired with the first read, and the second read is paired with the second read, like is usually the case.

What might work better is to split the .bam into a read 1 .bam and a read 2 .bam, and use those.
swbarnes2 is offline   Reply With Quote
Reply

Tags
bam, bwa, fastq, sampe

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:50 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO