Hi All-
I apologize if this has been asked- I searched and could not find any answer that address this question-
I was speaking to someone who is quite familiar with actually running HiSeq machines and asked a question about the origin of the lower quality scores for the first couple (3-4 seems to be when things settle down) bp of every read. At least for my data set, FastQC clearly shows lower (relatively) quality scores for the first few bp- showing 34 for reads 1-4 and 38 for the rest of the run.
I got an answer I did not expect, and can not find discussion about.
Basically- and this is a rough summary as I did not take notes. I was told that the way the quality scores are calculated during the run is through use of an algorithm that utilizes information about the read quality for a few preceding bases (unsure what its looking at but I think not simply the Q score, probably something about the relative ratio of the signal intensity for the various colors relative to each other). And that the first few reads do not (of course) fulfill this requirement for their algorithm and therefore are induced (artificially) to have lower scores... but should not be considered to be of low(er) quality.
This is really not a serious concern as the FastQC report shows Q scores of 34 for the first 3-4 bp then jumps to 38 for basically the rest of the read, so I don't think I am going to loose anything by quality trimming as 34 is quite good... but I was curious if the general reasoning behind the "low" quality score for the first few bp is correct and if I should ever think about this again with respect to quality trimming reads.
I apologize if this has been asked- I searched and could not find any answer that address this question-
I was speaking to someone who is quite familiar with actually running HiSeq machines and asked a question about the origin of the lower quality scores for the first couple (3-4 seems to be when things settle down) bp of every read. At least for my data set, FastQC clearly shows lower (relatively) quality scores for the first few bp- showing 34 for reads 1-4 and 38 for the rest of the run.
I got an answer I did not expect, and can not find discussion about.
Basically- and this is a rough summary as I did not take notes. I was told that the way the quality scores are calculated during the run is through use of an algorithm that utilizes information about the read quality for a few preceding bases (unsure what its looking at but I think not simply the Q score, probably something about the relative ratio of the signal intensity for the various colors relative to each other). And that the first few reads do not (of course) fulfill this requirement for their algorithm and therefore are induced (artificially) to have lower scores... but should not be considered to be of low(er) quality.
This is really not a serious concern as the FastQC report shows Q scores of 34 for the first 3-4 bp then jumps to 38 for basically the rest of the read, so I don't think I am going to loose anything by quality trimming as 34 is quite good... but I was curious if the general reasoning behind the "low" quality score for the first few bp is correct and if I should ever think about this again with respect to quality trimming reads.
Comment