bwa aln fred score offset

offsetoff

Junior Member

Join Date: Jun 2011

Posts: 1
- Share
- Tweet
#1

bwa aln fred score offset

06-11-2011, 08:08 AM

I just realized that I hadn't been specifying Illumina offset when running bwa (v.0.5.9-r16) alignments. BWA's -I command line option specifies the 64 byte Illumina offset vs. the default Sanger 33 byte offset. I start with fastqs from a HiSeq2k and perform the alignment with:

bwa aln -l 32 -t 4 $GENOME fooR1.fastq > fooR1.sai
bwa aln -l 32 -t 4 $GENOME fooR2.fastq > fooR2.sai

where I think I should have specified

bwa aln -I -l 32 -t 4 $GENOME fooR1.fastq > fooR1.sai
bwa aln -I -l 32 -t 4 $GENOME fooR2.fastq > fooR2.sai

then I do the bwa paired end alignment with bwa sampe and convert the sam to bam with samtools.

I have quite a few of these bam files with the incorrect fred score offset and I'm not sure the best way to proceed. A few questions:

What impact would an across the board fred score inflation have on alignment? We're using the alignment primary as a qc metric.

Would the fred scores which exceeded the Sanger format fred score range be truncated and essentially lost in the sam / bam files because the range is higher than was expected?

Provided the bam file can accommodate the wider range of fred scores and no data was lost, does a tool exist that would let me take an existing bam file and shift the fred scores down across the board?

The one curious thing I've seen is that converting the bam files back to fastq produces the correct fred scores, where I'd have expected them to be shifted.

It seems like ultimately the right solution is to re-align everything with the proper offset, but given the number of files involved I'm looking for some way to mitigate the impact (or at least understand it better) in the mean time.

Thanks,
Matt
Tags: None
maubp

Peter (Biopython etc)

Join Date: Jul 2009

Posts: 1543
- Share
- Tweet
#2

06-14-2011, 02:14 PM

Its PHRED not FRED, which might help with Google searching.

And yes, if your FASTQ files are using the Illumina FASTQ encoding with ASCII offset 64, use BWA with the -I switch to indicate this. By default it assumes you have standard Sanger FASTQ files where the offset is 33.

See also http://bio-bwa.sourceforge.net/bwa.shtml
Comment

Previous template Next

Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM
Strategies for Sequencing Challenging Samples

by seqadmin

Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
- Channel: Articles
03-22-2024, 06:39 AM

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

bwa aln fred score offset

Comment

Latest Articles

ad_right_rmr

News