SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Question on the speed of bwa aln wangzkai Bioinformatics 1 10-25-2011 01:47 AM
Questions about bwa aln -n and -o komais Bioinformatics 0 10-19-2011 07:56 PM
bwa aln Segmentation fault DNAjunk Bioinformatics 4 03-02-2011 06:28 AM
Bwa aln aleferna Bioinformatics 1 07-25-2010 10:12 PM
BWA Segmentation Fault (aln) raela Bioinformatics 0 05-18-2010 06:41 AM

Reply
 
Thread Tools
Old 06-11-2011, 08:08 AM   #1
offsetoff
Junior Member
 
Location: North America

Join Date: Jun 2011
Posts: 1
Default bwa aln fred score offset

I just realized that I hadn't been specifying Illumina offset when running bwa (v.0.5.9-r16) alignments. BWA's -I command line option specifies the 64 byte Illumina offset vs. the default Sanger 33 byte offset. I start with fastqs from a HiSeq2k and perform the alignment with:

bwa aln -l 32 -t 4 $GENOME fooR1.fastq > fooR1.sai
bwa aln -l 32 -t 4 $GENOME fooR2.fastq > fooR2.sai

where I think I should have specified

bwa aln -I -l 32 -t 4 $GENOME fooR1.fastq > fooR1.sai
bwa aln -I -l 32 -t 4 $GENOME fooR2.fastq > fooR2.sai

then I do the bwa paired end alignment with bwa sampe and convert the sam to bam with samtools.

I have quite a few of these bam files with the incorrect fred score offset and I'm not sure the best way to proceed. A few questions:

What impact would an across the board fred score inflation have on alignment? We're using the alignment primary as a qc metric.

Would the fred scores which exceeded the Sanger format fred score range be truncated and essentially lost in the sam / bam files because the range is higher than was expected?

Provided the bam file can accommodate the wider range of fred scores and no data was lost, does a tool exist that would let me take an existing bam file and shift the fred scores down across the board?

The one curious thing I've seen is that converting the bam files back to fastq produces the correct fred scores, where I'd have expected them to be shifted.

It seems like ultimately the right solution is to re-align everything with the proper offset, but given the number of files involved I'm looking for some way to mitigate the impact (or at least understand it better) in the mean time.

Thanks,
Matt
offsetoff is offline   Reply With Quote
Old 06-14-2011, 02:14 PM   #2
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Its PHRED not FRED, which might help with Google searching.

And yes, if your FASTQ files are using the Illumina FASTQ encoding with ASCII offset 64, use BWA with the -I switch to indicate this. By default it assumes you have standard Sanger FASTQ files where the offset is 33.

See also http://bio-bwa.sourceforge.net/bwa.shtml
maubp is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:15 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO