SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
what do the quality scores of consensus fastq mean? ymwur Bioinformatics 0 06-03-2013 11:49 PM
GATK base quality recalibration suppose to keep old and new quality scores? Heisman Bioinformatics 2 10-21-2011 07:40 AM
Any suggestion for calculating overall Phred scale quality score for a sequence? sulicon Bioinformatics 3 07-27-2011 08:10 PM
Compute the consensus quality, SNP quality in SAMTools lyz1030 Bioinformatics 0 04-13-2011 05:09 PM
Illumina quality scores dlepp Illumina/Solexa 6 02-28-2011 11:09 PM

Reply
 
Thread Tools
Old 10-07-2013, 11:26 AM   #1
mothurwestcott
Junior Member
 
Location: USA

Join Date: Oct 2013
Posts: 3
Default Calculating consensus quality scores

Hi All,
I am new to this forum and looking for advice on what the proper way is to calculate a consensus quality scores for paired end reads. Here's a concrete example of a portion of 2 aligned reads and their scores:

fragment1 - GGAGGATGCGAGCGTTATCCGG-ATTTATTGGGTTTAAA
fragment2 - CGAGGGTGCAGGGGTTAACCGGAATTTA-TGGGTGTGAA
contig - GGAGGGTGCAAGCGTTATCCGGATTTATTGGGTTTAAA

base1 base2 score1 score2
G C 33 12
G G 32 26
A A 32 12
G G 31 12
G G 33 14
A G 17 24
T T 34 12
G G 37 12
C C 37 12
G A 17 26
A G 36 24
G G 37 12
C G 38 14
G G 38 14
T T 38 24
T T 38 26
A A 38 12
T A 38 12
C C 38 12
C C 39 14
G G 38 14
G G 38 26
- A 33 14
A A 38 14
T T 38 24
T T 38 14
T T 38 14
A A 39 14
T - 39 12
T T 38 26
G G 39 12
G G 37 26
G G 39 26
T T 36 14
T G 36 26
T T 36 26
A G 37 12
A A 39 37
A A 39 31

How would you calculate the contigs quality scores? Would you suggest different methods for bases that match? bases that don't? and gap to base situations? Thanks in advance for your help!

Kindly,
Sarah
mothurwestcott is offline   Reply With Quote
Old 10-07-2013, 01:25 PM   #2
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Are "fragment 1" and "fragment 2" paired-end reads and "contig" an example alignment of them to the reference? From your phrasing, it's difficult to tell if you want a mapping score or a consensus Phred score for the base calls.
dpryan is offline   Reply With Quote
Old 10-07-2013, 02:18 PM   #3
mothurwestcott
Junior Member
 
Location: USA

Join Date: Oct 2013
Posts: 3
Default

Thanks for your response and question. Let me try to clarify a bit. Fragment 1 is a portion of the forward read and Fragment 2 a portion of the reverse read. They are aligned to each other and the posted section is part of where they overlap. The contig is an assembly of the 2 fragments. In this simple example, where the bases in the fragments are mismatched the base with the better quality score was selected to be part of the contig. For the line: "G C 33 12" 33 is the quality score for the base G taken directly from the fastq file and 12 is the quality score for the base C. G is selected as the base in the contig, but how would you suggest calculating the quality score for G in the contig?
mothurwestcott is offline   Reply With Quote
Old 10-07-2013, 02:28 PM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,077
Default

The quality scores you are looking at are for the individual bases and express reliability of the base call at that position (http://en.wikipedia.org/wiki/FASTQ_format#Quality). It is probably not appropriate to simply add/average them.

If these reads are overlapping then you may want to use a program to collapse them into a single representation. http://thegenomefactory.blogspot.com...aired-end.html

Your downstream application may also determine how you want to handle them.
GenoMax is offline   Reply With Quote
Old 10-07-2013, 02:38 PM   #5
mothurwestcott
Junior Member
 
Location: USA

Join Date: Oct 2013
Posts: 3
Default

Thanks for the links. I work for the mothur project. We have a command, make.contigs http://www.mothur.org/wiki/Make.contigs that assembles overlapping paired end reads. The tool currently assembles the contigs taking into account inserts, mismatches and the difference in the quality scores. We have had some requests for assembled quality data and are interested the communities thoughts on the best way to do this. Your thoughts?
mothurwestcott is offline   Reply With Quote
Old 10-07-2013, 04:13 PM   #6
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,077
Default

Quote:
Originally Posted by mothurwestcott View Post
Thanks for the links. I work for the mothur project. We have a command, make.contigs http://www.mothur.org/wiki/Make.contigs that assembles overlapping paired end reads. The tool currently assembles the contigs taking into account inserts, mismatches and the difference in the quality scores. We have had some requests for assembled quality data and are interested the communities thoughts on the best way to do this. Your thoughts?
If the bases are matching then potentially you could keep the higher of the two quality values considering positional context of the base in the read.
GenoMax is offline   Reply With Quote
Old 09-23-2014, 05:40 AM   #7
Jegar
Junior Member
 
Location: Cambridge

Join Date: Aug 2014
Posts: 6
Default

How you combine these scores depends on the platform you are using, as the Phred scores are calculated differently.

If they are Illumina scores, I believe it is appropriate to add the scores together, as they are log transformed scores reflecting the likelihood of the base call being in error so adding them is equivalent to multiplying the likelihood of each call (i.e. the probability of base 1 AND base 2 being in error). This causes very high Phred-like scores in some instances, but from what I have read, this reflects the inaccuracy of Illumina's Phred scores rather than the methodology used to combine.

I am very happy to be corrected on this!
Jegar is offline   Reply With Quote
Reply

Tags
contigs, paired end reads, quality scores

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:48 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO