I just realized that I hadn't been specifying Illumina offset when running bwa (v.0.5.9-r16) alignments. BWA's -I command line option specifies the 64 byte Illumina offset vs. the default Sanger 33 byte offset. I start with fastqs from a HiSeq2k and perform the alignment with:
bwa aln -l 32 -t 4 $GENOME fooR1.fastq > fooR1.sai
bwa aln -l 32 -t 4 $GENOME fooR2.fastq > fooR2.sai
where I think I should have specified
bwa aln -I -l 32 -t 4 $GENOME fooR1.fastq > fooR1.sai
bwa aln -I -l 32 -t 4 $GENOME fooR2.fastq > fooR2.sai
then I do the bwa paired end alignment with bwa sampe and convert the sam to bam with samtools.
I have quite a few of these bam files with the incorrect fred score offset and I'm not sure the best way to proceed. A few questions:
What impact would an across the board fred score inflation have on alignment? We're using the alignment primary as a qc metric.
Would the fred scores which exceeded the Sanger format fred score range be truncated and essentially lost in the sam / bam files because the range is higher than was expected?
Provided the bam file can accommodate the wider range of fred scores and no data was lost, does a tool exist that would let me take an existing bam file and shift the fred scores down across the board?
The one curious thing I've seen is that converting the bam files back to fastq produces the correct fred scores, where I'd have expected them to be shifted.
It seems like ultimately the right solution is to re-align everything with the proper offset, but given the number of files involved I'm looking for some way to mitigate the impact (or at least understand it better) in the mean time.
Thanks,
Matt
bwa aln -l 32 -t 4 $GENOME fooR1.fastq > fooR1.sai
bwa aln -l 32 -t 4 $GENOME fooR2.fastq > fooR2.sai
where I think I should have specified
bwa aln -I -l 32 -t 4 $GENOME fooR1.fastq > fooR1.sai
bwa aln -I -l 32 -t 4 $GENOME fooR2.fastq > fooR2.sai
then I do the bwa paired end alignment with bwa sampe and convert the sam to bam with samtools.
I have quite a few of these bam files with the incorrect fred score offset and I'm not sure the best way to proceed. A few questions:
What impact would an across the board fred score inflation have on alignment? We're using the alignment primary as a qc metric.
Would the fred scores which exceeded the Sanger format fred score range be truncated and essentially lost in the sam / bam files because the range is higher than was expected?
Provided the bam file can accommodate the wider range of fred scores and no data was lost, does a tool exist that would let me take an existing bam file and shift the fred scores down across the board?
The one curious thing I've seen is that converting the bam files back to fastq produces the correct fred scores, where I'd have expected them to be shifted.
It seems like ultimately the right solution is to re-align everything with the proper offset, but given the number of files involved I'm looking for some way to mitigate the impact (or at least understand it better) in the mean time.
Thanks,
Matt
Comment