Seqanswers Leaderboard Ad

**fkrueger** · 12-05-2010, 01:22 AM

Hi Oregon,

the 4th line of a FastQ file shows the basecall qualities for the bases in line 2. Paired-end data is normally arranged so that each of the paired end files contains the sequences on either end in the same order throughout both files.
The quality score 'B' (Phred score 2) is not a generic low quality value but has a special meaning which was already discussed in a previous thread:

Illumina FASTQ Quality Scores - Missing Value - SEQanswers

http://seqanswers.com/forums/showthread.php?t=4721&highlight=quality+score+illumina+special

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

Perhaps it is related to this new 'feature' of Pipeline 1.3+? See SLIDE 17 in http://docs.google.com/fileview?id=0...NTUyNDE3&hl=en. Here is the text of the slide:

"The Read Segment Quality Control Indicator: At the ends of some reads, quality scores are unreliable. Illumina has an algorithm for identifying these unreliable runs of quality scores, and we use a special indicator to flag these portions of reads A quality score of 2, encoded as a "B", is used as a special indicator. A quality score of 2 does not imply a specific error rate, but rather implies that the marked region of the read should not be used for downstream analysis. Some reads will end with a run of B (or Q2) basecalls, but there will never be an isolated Q2 basecall."

Also, looking at the first few (hundred) lines of a FastQ file only can give you a wrong impression, as they can contain more than 100 million lines. Using a quality control tool such as FastQC might help you to get a better idea of your sequencing data.

Best wishes

**oregon** · 12-05-2010, 09:10 AM

thanks!

Thanks for the reply- I used two fastq analysis programs and they are both showing terrible quality scores- I guess I will get in touch with our sequencing people and see what is going on.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Today, 11:49 AM	0 responses 8 views 0 likes	Last Post by seqadmin Today, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

paired end quality scores

Comment

Comment

Latest Articles

ad_right_rmr

News