illumina quality scores first 2-3 bp reliability?

rufessor

Junior Member

Join Date: Oct 2014

Posts: 5
- Share
- Tweet
#1

illumina quality scores first 2-3 bp reliability?

11-10-2014, 10:33 AM

Hi All-
I apologize if this has been asked- I searched and could not find any answer that address this question-

I was speaking to someone who is quite familiar with actually running HiSeq machines and asked a question about the origin of the lower quality scores for the first couple (3-4 seems to be when things settle down) bp of every read. At least for my data set, FastQC clearly shows lower (relatively) quality scores for the first few bp- showing 34 for reads 1-4 and 38 for the rest of the run.

I got an answer I did not expect, and can not find discussion about.

Basically- and this is a rough summary as I did not take notes. I was told that the way the quality scores are calculated during the run is through use of an algorithm that utilizes information about the read quality for a few preceding bases (unsure what its looking at but I think not simply the Q score, probably something about the relative ratio of the signal intensity for the various colors relative to each other). And that the first few reads do not (of course) fulfill this requirement for their algorithm and therefore are induced (artificially) to have lower scores... but should not be considered to be of low(er) quality.

This is really not a serious concern as the FastQC report shows Q scores of 34 for the first 3-4 bp then jumps to 38 for basically the rest of the read, so I don't think I am going to loose anything by quality trimming as 34 is quite good... but I was curious if the general reasoning behind the "low" quality score for the first few bp is correct and if I should ever think about this again with respect to quality trimming reads.

Last edited by rufessor; 11-10-2014, 10:41 AM.
Tags: illumina, quality, rnaseq, trim
Brian Bushnell

Super Moderator

Join Date: Jan 2014

Posts: 2709
- Share
- Tweet
#2

11-10-2014, 11:52 AM

Quality trimming to such high levels is generally not a good idea. For most purposes - mapping, assembly, merging, etc - you will get better results using a lower threshold, below 20. I tend to use 6-10 in general and 15 at the most.

The quality scores of the first few bases are not accurate. As you said, I believe they are artificially lowered to compensate for the fact that the base caller has not been trained yet, or the cluster locations have not been nailed down precisely. The true accuracy is substantially higher, in general - though that might not be the case when sequencing low-diversity libraries where you selectively amplified some specific gene sequence.

If you have a reference (or any assembly), you can run BBMap with the flag "mhist=mhist.txt" to produce a histogram of the match/substitution/deletion/insertion rates at every base location in the reads - this is the most accurate way that I know of to determine whether trimming is needed.
Comment

Previous template Next

Recent Advances in Sequencing Analysis Tools

by seqadmin

The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
- Channel: Articles
05-06-2024, 07:48 AM
Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM

Topics	Statistics	Last Post
The Role of Spliceosomes in RNA Splicing and Genome Evolution by seqadmin Started by seqadmin, Today, 07:03 AM	0 responses 10 views 0 likes	Last Post by seqadmin Today, 07:03 AM
A Closer Look at the Enigmatic Genomes of Oikopleura dioica by seqadmin Started by seqadmin, 05-10-2024, 06:35 AM	0 responses 31 views 0 likes	Last Post by seqadmin 05-10-2024, 06:35 AM
Advanced Epigenome Editing Platform Explores Gene Regulation Mechanisms by seqadmin Started by seqadmin, 05-09-2024, 02:46 PM	0 responses 41 views 0 likes	Last Post by seqadmin 05-09-2024, 02:46 PM
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, 05-07-2024, 06:57 AM	0 responses 33 views 0 likes	Last Post by seqadmin 05-07-2024, 06:57 AM

Seqanswers Leaderboard Ad

Announcement

illumina quality scores first 2-3 bp reliability?

Comment

Latest Articles

ad_right_rmr

News