Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bwa aln fred score offset

    I just realized that I hadn't been specifying Illumina offset when running bwa (v.0.5.9-r16) alignments. BWA's -I command line option specifies the 64 byte Illumina offset vs. the default Sanger 33 byte offset. I start with fastqs from a HiSeq2k and perform the alignment with:

    bwa aln -l 32 -t 4 $GENOME fooR1.fastq > fooR1.sai
    bwa aln -l 32 -t 4 $GENOME fooR2.fastq > fooR2.sai

    where I think I should have specified

    bwa aln -I -l 32 -t 4 $GENOME fooR1.fastq > fooR1.sai
    bwa aln -I -l 32 -t 4 $GENOME fooR2.fastq > fooR2.sai

    then I do the bwa paired end alignment with bwa sampe and convert the sam to bam with samtools.

    I have quite a few of these bam files with the incorrect fred score offset and I'm not sure the best way to proceed. A few questions:

    What impact would an across the board fred score inflation have on alignment? We're using the alignment primary as a qc metric.

    Would the fred scores which exceeded the Sanger format fred score range be truncated and essentially lost in the sam / bam files because the range is higher than was expected?

    Provided the bam file can accommodate the wider range of fred scores and no data was lost, does a tool exist that would let me take an existing bam file and shift the fred scores down across the board?

    The one curious thing I've seen is that converting the bam files back to fastq produces the correct fred scores, where I'd have expected them to be shifted.

    It seems like ultimately the right solution is to re-align everything with the proper offset, but given the number of files involved I'm looking for some way to mitigate the impact (or at least understand it better) in the mean time.

    Thanks,
    Matt

  • #2
    Its PHRED not FRED, which might help with Google searching.

    And yes, if your FASTQ files are using the Illumina FASTQ encoding with ASCII offset 64, use BWA with the -I switch to indicate this. By default it assumes you have standard Sanger FASTQ files where the offset is 33.

    See also http://bio-bwa.sourceforge.net/bwa.shtml

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM
    • seqadmin
      Strategies for Sequencing Challenging Samples
      by seqadmin


      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
      03-22-2024, 06:39 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 04-11-2024, 12:08 PM
    0 responses
    30 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 10:19 PM
    0 responses
    32 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 09:21 AM
    0 responses
    28 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-04-2024, 09:00 AM
    0 responses
    53 views
    0 likes
    Last Post seqadmin  
    Working...
    X