Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
BWA paired end mapping quality pparg Bioinformatics 9 11-14-2011 06:51 PM
GATK base quality recalibration suppose to keep old and new quality scores? Heisman Bioinformatics 2 10-21-2011 07:40 AM
Illumina quality scores ewilbanks Bioinformatics 3 11-10-2010 08:52 AM
Illumina 1.3 v 1.8 quality scores Graham Etherington Bioinformatics 1 10-18-2010 07:00 AM
New merge function creates Sanger Quality Sequence from NGS paired end reads SoftGenetics Vendor Forum 0 02-23-2010 07:29 AM

Thread Tools
Old 12-04-2010, 02:59 PM   #1
Junior Member
Location: oregon

Join Date: Dec 2010
Posts: 3
Default paired end quality scores

I'm new at bioinformatics, and just got a paired end (120bp) set of Illumina sequences, the sequence_2 file looks like I'm used to but the sequence_1 looks like this:


I have been looking at RAD sequencing data, and interpreting the 2nd line as fastq scores but if that's true for these, the quality is really poor. Is that true or do the paired end reads have a different file format? Thanks for any advice!
oregon is offline   Reply With Quote
Old 12-05-2010, 12:22 AM   #2
Senior Member
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 624

Hi Oregon,

the 4th line of a FastQ file shows the basecall qualities for the bases in line 2. Paired-end data is normally arranged so that each of the paired end files contains the sequences on either end in the same order throughout both files.
The quality score 'B' (Phred score 2) is not a generic low quality value but has a special meaning which was already discussed in a previous thread:

Perhaps it is related to this new 'feature' of Pipeline 1.3+? See SLIDE 17 in Here is the text of the slide:

"The Read Segment Quality Control Indicator: At the ends of some reads, quality scores are unreliable. Illumina has an algorithm for identifying these unreliable runs of quality scores, and we use a special indicator to flag these portions of reads A quality score of 2, encoded as a "B", is used as a special indicator. A quality score of 2 does not imply a specific error rate, but rather implies that the marked region of the read should not be used for downstream analysis. Some reads will end with a run of B (or Q2) basecalls, but there will never be an isolated Q2 basecall."
Also, looking at the first few (hundred) lines of a FastQ file only can give you a wrong impression, as they can contain more than 100 million lines. Using a quality control tool such as FastQC might help you to get a better idea of your sequencing data.

Best wishes
fkrueger is offline   Reply With Quote
Old 12-05-2010, 08:10 AM   #3
Junior Member
Location: oregon

Join Date: Dec 2010
Posts: 3
Default thanks!

Thanks for the reply- I used two fastq analysis programs and they are both showing terrible quality scores- I guess I will get in touch with our sequencing people and see what is going on.
oregon is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 05:11 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO