I am having a problem in which bwa aligns one .bam, but not another despite the fact that they are simply forward and reverse reads from the same sequencing run. The two .bam files and their reference are attached (files are very small: ~1000 reads per .bam file, reference is ~1600 bases)
Background:
PCR amplicon of the V1 hypervariable region of as single purified bacteria.
Ion Torrent sequencing adaptors ligated, and sequenced on Ion Torrent PGM.
Because of adaptor ligation, reads are in forward and reverse directions.
By using forward and reverse primers as the "barcodes", the reads are seperated into 2 unaligned .bam files, one with forward (F.bam), and one with reverse reads (R.bam).
Reference fasta file for alignment is the 16s sequence of the same bacterial species from NCBI. We will call it 28.fasta.
Goal:
Generate a consensus sequence from each of the .bam files.
Approach:
Align .bam files with bwa, pass to samtools mpileup to make consensus sequence
Steps:
1) index reference:
bwa index -p 28 -a is 28.fasta
2) find SA coodinates of input files:
bwa aln 28 -b R.bam > R.sai
bwa aln 28 -b F.bam > F.sai
3) Generate alignments in SAM format:
bwa samse 28 R.sai R.bam > R.aligned.sam
bwa samse 28 F.sai F.bam > F.aligned.sam
4) Convert .sam to .bam for use in samtools:
samtools view -bS R.aligned.sam > R.aligned.bam
samtools view -bS F.aligned.sam > F.aligned.bam
5) Sort .bam files
samtools sort R.aligned.bam R.aligned.sorted
samtools sort F.aligned.bam F.aligned.sorted
6) get the consensus sequence
samtools mpileup -uf 28.fasta R.aligned.sorted.bam | bcftools view -cg - > R.vcf
samtools mpileup -uf 28.fasta F.aligned.sorted.bam | bcftools view -cg - > F.vcf
7) Convert vcf to fasta:
perl vcfutils.pl vcf2fasta R.vcf > R.fasta
perl vcfutils.pl vcf2fasta F.vcf > F.fasta
Viewing the R.fasta consensus file, I see a consensus read at the start of the reference which is where we expect to see a V1 consensus sequence (I'll leave my questions about the gaps indicated by lowercase for later):
$ more R.fasta
>gi|7407118|gb|AF227164.1|
nttgAACGCTAGCGGgatgctttacacatGCAAGTCGAACGGCAGCACAGAGGAACTTGT
TCCTTGGGTGGCGAGTGGCGCACGGGTnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnaccggtgcnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnccacggt
a
Viewing the F.fasta file, I see FAILURE!
>gi|7407118|gb|AF227164.1|
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnntggatgatg
Working backwards through the steps. If I take the aligned.sorted.bam files and index them and view them with samtools tview, I can see the alignment to the V1 region in R.aligned.sorted.bam, while there is no alignment whatsoever with F.aligned.sorted.bam. This points to the bwa samse step as the problem.
Can anyone answer why one .bam file is aligning fine in bwa while while the other fails?
Background:
PCR amplicon of the V1 hypervariable region of as single purified bacteria.
Ion Torrent sequencing adaptors ligated, and sequenced on Ion Torrent PGM.
Because of adaptor ligation, reads are in forward and reverse directions.
By using forward and reverse primers as the "barcodes", the reads are seperated into 2 unaligned .bam files, one with forward (F.bam), and one with reverse reads (R.bam).
Reference fasta file for alignment is the 16s sequence of the same bacterial species from NCBI. We will call it 28.fasta.
Goal:
Generate a consensus sequence from each of the .bam files.
Approach:
Align .bam files with bwa, pass to samtools mpileup to make consensus sequence
Steps:
1) index reference:
bwa index -p 28 -a is 28.fasta
2) find SA coodinates of input files:
bwa aln 28 -b R.bam > R.sai
bwa aln 28 -b F.bam > F.sai
3) Generate alignments in SAM format:
bwa samse 28 R.sai R.bam > R.aligned.sam
bwa samse 28 F.sai F.bam > F.aligned.sam
4) Convert .sam to .bam for use in samtools:
samtools view -bS R.aligned.sam > R.aligned.bam
samtools view -bS F.aligned.sam > F.aligned.bam
5) Sort .bam files
samtools sort R.aligned.bam R.aligned.sorted
samtools sort F.aligned.bam F.aligned.sorted
6) get the consensus sequence
samtools mpileup -uf 28.fasta R.aligned.sorted.bam | bcftools view -cg - > R.vcf
samtools mpileup -uf 28.fasta F.aligned.sorted.bam | bcftools view -cg - > F.vcf
7) Convert vcf to fasta:
perl vcfutils.pl vcf2fasta R.vcf > R.fasta
perl vcfutils.pl vcf2fasta F.vcf > F.fasta
Viewing the R.fasta consensus file, I see a consensus read at the start of the reference which is where we expect to see a V1 consensus sequence (I'll leave my questions about the gaps indicated by lowercase for later):
$ more R.fasta
>gi|7407118|gb|AF227164.1|
nttgAACGCTAGCGGgatgctttacacatGCAAGTCGAACGGCAGCACAGAGGAACTTGT
TCCTTGGGTGGCGAGTGGCGCACGGGTnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnaccggtgcnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnccacggt
a
Viewing the F.fasta file, I see FAILURE!
>gi|7407118|gb|AF227164.1|
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnntggatgatg
Working backwards through the steps. If I take the aligned.sorted.bam files and index them and view them with samtools tview, I can see the alignment to the V1 region in R.aligned.sorted.bam, while there is no alignment whatsoever with F.aligned.sorted.bam. This points to the bwa samse step as the problem.
Can anyone answer why one .bam file is aligning fine in bwa while while the other fails?
Comment