SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
[Question] TopHat aligns more than 1 times g781 RNA Sequencing 0 02-22-2013 12:53 PM
too small output .sai file created by bwa aln? ..leads to bwa sampe hanging? Stina Bioinformatics 12 12-05-2012 07:32 AM
BWA not creating rBWT file ramiro2k Bioinformatics 2 03-23-2012 02:27 AM
BWA - file formats robekubica Bioinformatics 1 08-27-2011 04:07 PM
question about bwa sam file peachgil Bioinformatics 1 04-06-2011 08:53 AM

Reply
 
Thread Tools
Old 03-06-2013, 10:20 AM   #1
stealthtoad
Junior Member
 
Location: Tucson

Join Date: Mar 2013
Posts: 6
Default bwa aligns one file but not another

I am having a problem in which bwa aligns one .bam, but not another despite the fact that they are simply forward and reverse reads from the same sequencing run. The two .bam files and their reference are attached (files are very small: ~1000 reads per .bam file, reference is ~1600 bases)

Background:
PCR amplicon of the V1 hypervariable region of as single purified bacteria.
Ion Torrent sequencing adaptors ligated, and sequenced on Ion Torrent PGM.
Because of adaptor ligation, reads are in forward and reverse directions.
By using forward and reverse primers as the "barcodes", the reads are seperated into 2 unaligned .bam files, one with forward (F.bam), and one with reverse reads (R.bam).
Reference fasta file for alignment is the 16s sequence of the same bacterial species from NCBI. We will call it 28.fasta.

Goal:
Generate a consensus sequence from each of the .bam files.

Approach:
Align .bam files with bwa, pass to samtools mpileup to make consensus sequence

Steps:
1) index reference:
bwa index -p 28 -a is 28.fasta

2) find SA coodinates of input files:
bwa aln 28 -b R.bam > R.sai
bwa aln 28 -b F.bam > F.sai

3) Generate alignments in SAM format:
bwa samse 28 R.sai R.bam > R.aligned.sam
bwa samse 28 F.sai F.bam > F.aligned.sam

4) Convert .sam to .bam for use in samtools:
samtools view -bS R.aligned.sam > R.aligned.bam
samtools view -bS F.aligned.sam > F.aligned.bam

5) Sort .bam files
samtools sort R.aligned.bam R.aligned.sorted
samtools sort F.aligned.bam F.aligned.sorted

6) get the consensus sequence
samtools mpileup -uf 28.fasta R.aligned.sorted.bam | bcftools view -cg - > R.vcf
samtools mpileup -uf 28.fasta F.aligned.sorted.bam | bcftools view -cg - > F.vcf

7) Convert vcf to fasta:
perl vcfutils.pl vcf2fasta R.vcf > R.fasta
perl vcfutils.pl vcf2fasta F.vcf > F.fasta

Viewing the R.fasta consensus file, I see a consensus read at the start of the reference which is where we expect to see a V1 consensus sequence (I'll leave my questions about the gaps indicated by lowercase for later):

$ more R.fasta
>gi|7407118|gb|AF227164.1|
nttgAACGCTAGCGGgatgctttacacatGCAAGTCGAACGGCAGCACAGAGGAACTTGT
TCCTTGGGTGGCGAGTGGCGCACGGGTnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnaccggtgcnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnccacggt
a

Viewing the F.fasta file, I see FAILURE!
>gi|7407118|gb|AF227164.1|
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnntggatgatg


Working backwards through the steps. If I take the aligned.sorted.bam files and index them and view them with samtools tview, I can see the alignment to the V1 region in R.aligned.sorted.bam, while there is no alignment whatsoever with F.aligned.sorted.bam. This points to the bwa samse step as the problem.

Can anyone answer why one .bam file is aligning fine in bwa while while the other fails?
Attached Files
File Type: tar 28.fasta.tar (10.0 KB, 1 views)
File Type: tar F.bam.tar (420.0 KB, 1 views)
File Type: tar R.bam.tar (300.0 KB, 1 views)
stealthtoad is offline   Reply With Quote
Old 03-06-2013, 01:39 PM   #2
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

Did you eyeball the second read? Did you eyeball its sam file?

With a total failure, there is some assumption you are making about your files that is simply wrong, and you have to diagnose that yourself, because you have the files.
swbarnes2 is offline   Reply With Quote
Old 03-06-2013, 02:50 PM   #3
stealthtoad
Junior Member
 
Location: Tucson

Join Date: Mar 2013
Posts: 6
Default

Thanks for the prompt reply swbarnes2.

>Did you eyeball the second read...sam file?

Yes, I have. What I did was actually align the same two .bam files with tmap. I then opened the 4 alignments (R.bam and F.bam each aligned in bwa and tmap) in IGV. What I see is this:

1) R.bam aligned with bwa: coverage in the reverse direction starting at 46 and dropping to 1 - indicating very few of the reads actually mapped. This is shown in the .sam file in the second column - most reads have a flag of 4.

2) F.bam aligned with bwa: no coverage except for a very short read to mapped to the wrong place

3) R.bam aligned with tmap: greater than 700x coverage in the reverse direction across the whole amplicon. This is the expected result.

4) F.bam aligned with tmap: greater than 1000x coverage in the forward direction across the whole amplicon. This is the expected result.

Here's a screen cap of the above IGV with teh two tmap alignments on top:



From these results, it appears that both .bam files are "good" and produce the expected coverage based on the number of reads in the files when used with tmap. So my question remains, what is going so horribly wrong in bwa? Even the .bam that works only maps a tiny fraction of the reads. According to the (rather terse) bwa manual entry for bwa samse:

"Generate alignments in the SAM format given single-end reads. Repetitive hits will be randomly chosen."

Could this last line mean that of all the full length reads, one is randomly chosen, and the rest are ignored? The great majority of the reads in my .bam file are high quality full length reads. This would explain why I see only one full length read in the bwa alignment of R.bam with all the other reads not being full length. It still doesn't explain why I get nothing for F.bam

Last edited by stealthtoad; 03-06-2013 at 02:54 PM. Reason: clarigy last sentence.
stealthtoad is offline   Reply With Quote
Old 03-06-2013, 03:37 PM   #4
stealthtoad
Junior Member
 
Location: Tucson

Join Date: Mar 2013
Posts: 6
Default

As a follow up, it has occurred to me that perhaps bwa needs some fine tuning using the options available. The only option for bwa samse is [-n MaxOcc] which the manual explains:

"Maximum number of alignments to output in the XA tag for reads paired properly. If a read has more than INT hits, the XA tag will not be written."

While this sounds like its only applicable to paired end reads, which mine are not. I tried it anyway with -n set to 500. The results were the same.

I suppose in the end the answer is "align Ion Torrent output with the Ion Torrent aligner (tmap) or don't be surprised when things don't go your way", nonetheless I would still like to understand bwa's behavior.

Last edited by stealthtoad; 03-06-2013 at 03:38 PM. Reason: I type badly.
stealthtoad is offline   Reply With Quote
Old 03-07-2013, 05:03 AM   #5
xied75
Senior Member
 
Location: Oxford

Join Date: Feb 2012
Posts: 129
Default

Quote:
"Generate alignments in the SAM format given single-end reads. Repetitive hits will be randomly chosen."

Could this last line mean that of all the full length reads, one is randomly chosen, and the rest are ignored? The great majority of the reads in my .bam file are high quality full length reads. This would explain why I see only one full length read in the bwa alignment of R.bam with all the other reads not being full length. It still doesn't explain why I get nothing for F.bam
No this means that when a reads can be mapped to many places and the mapping quality are same then one hit will be chosen randomly to output into the sam file.

If you want more hits you should adjust parameters in BWA ALN step, other than mess with BWA SAMSE, SAMSE merely reads from .sai file to make the .sam, not much going on here.

Best,

Dong
xied75 is offline   Reply With Quote
Old 03-07-2013, 07:47 AM   #6
stealthtoad
Junior Member
 
Location: Tucson

Join Date: Mar 2013
Posts: 6
Default

Dong,

Thanks for clarifying the samse manual entry.

Any idea why bwa is showing only a single full length alignment for R.bam and nothing for F.bam?

As I have demonstrated, it is not something wrong with the .bam files as swbarnes2 suggested: tmap does just fine with them.

I have also tested whether the proximity of the read alignment to the start of the reference fasta was an issue - no change in the results.

As you suggested, I have looked at the bwa aln options and cannot see any settings in that would affect the number of reads produced in the alignment.

I am going to try bwa mem....
stealthtoad is offline   Reply With Quote
Old 03-07-2013, 07:52 AM   #7
xied75
Senior Member
 
Location: Oxford

Join Date: Feb 2012
Posts: 129
Default

I tried your data, BWA ALN with -n 10 definitely changed the results. Also your reads seems in different lengths. For ALN there are so many options to play with, e.g. -l for seed length, -k for max diff in seed, etc.
xied75 is offline   Reply With Quote
Old 03-07-2013, 08:33 AM   #8
stealthtoad
Junior Member
 
Location: Tucson

Join Date: Mar 2013
Posts: 6
Default

xied75,

Thanks, I will try bwa aln -n 10 and check out the results.

Update on bwa mem. The version of bwa I had was 0.6.1 which does not have mem. So I downloaded and installed bwa 0.7.0. and tried to align the R.bam and F.bam with bwa mem which didn't work. The manual says bwa mem input is fastq, so I had to download bam2fastq, install that, and then convert the .bam files to fastq. bwa mem then dutifully aligned the fastq files. The results of bwa mem match what I got with tmap - success!

One issue with this approach (.bam > .fastq > bwa mem > aligned.sam) I am concerned that the quality scores may (are?) coded differently in a PGM .bam file than an Illumina file, so if bam2fastq assumes my .bam is Illumina, the quality scores in the resulting fastq will not be valid....
stealthtoad is offline   Reply With Quote
Old 03-07-2013, 11:03 AM   #9
stealthtoad
Junior Member
 
Location: Tucson

Join Date: Mar 2013
Posts: 6
Default

Ok, I've tried bwa aln with -n set to: 10, 50, and 500 in both bwa 0.6.1 and 0.7.0:

R.bam max coverage went from 46 without -n set, to 68 with n = 10, 72 when n = 50, and 73 when n=500, whereas alignment with bwa mem or tmap yielded max coverage of >700. There was no difference in results between bwa 0.6.1 and bwa 0.7.0.

F.bam max coverage went from 0 without -n set, to 0 with n = 10, 0 with n=50, and 0 with n=500, whereas alignment with bwa mem or tmap gave max coverage >1000.

So, I still don't know why bwa aln won't align the forward reads from F.bam, while both bwa mem and tmap will. In addition while R.bam reads align with bwa aln, I can't seem to get an optimized alignment fiddling with the -n option.

Why not just use tmap or bwa mem then? Well bwa mem requires a FASTQ input which leaves me worrying about Ion Torrent quality scores being converted correctly by bam2fastq, and tmap isn't installed on the server I am using, plus I'd like to understand why bwa isn't working on the F.bam reads so that I learn something...

I will try -l seed length and -k max dif in seed...

Last edited by stealthtoad; 03-07-2013 at 11:05 AM.
stealthtoad is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:27 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO