Seqanswers Leaderboard Ad

**dpryan** · 10-07-2013, 01:25 PM

Are "fragment 1" and "fragment 2" paired-end reads and "contig" an example alignment of them to the reference? From your phrasing, it's difficult to tell if you want a mapping score or a consensus Phred score for the base calls.

**mothurwestcott** · 10-07-2013, 02:18 PM

Thanks for your response and question. Let me try to clarify a bit. Fragment 1 is a portion of the forward read and Fragment 2 a portion of the reverse read. They are aligned to each other and the posted section is part of where they overlap. The contig is an assembly of the 2 fragments. In this simple example, where the bases in the fragments are mismatched the base with the better quality score was selected to be part of the contig. For the line: "G C 33 12" 33 is the quality score for the base G taken directly from the fastq file and 12 is the quality score for the base C. G is selected as the base in the contig, but how would you suggest calculating the quality score for G in the contig?

**GenoMax** · 10-07-2013, 02:28 PM

The quality scores you are looking at are for the individual bases and express reliability of the base call at that position (http://en.wikipedia.org/wiki/FASTQ_format#Quality). It is probably not appropriate to simply add/average them.

If these reads are overlapping then you may want to use a program to collapse them into a single representation. http://thegenomefactory.blogspot.com...aired-end.html

Your downstream application may also determine how you want to handle them.

**mothurwestcott** · 10-07-2013, 02:38 PM

Thanks for the links. I work for the mothur project. We have a command, make.contigs http://www.mothur.org/wiki/Make.contigs that assembles overlapping paired end reads. The tool currently assembles the contigs taking into account inserts, mismatches and the difference in the quality scores. We have had some requests for assembled quality data and are interested the communities thoughts on the best way to do this. Your thoughts?

**GenoMax** · 10-07-2013, 04:13 PM

Originally posted by mothurwestcott View Post

Thanks for the links. I work for the mothur project. We have a command, make.contigs http://www.mothur.org/wiki/Make.contigs that assembles overlapping paired end reads. The tool currently assembles the contigs taking into account inserts, mismatches and the difference in the quality scores. We have had some requests for assembled quality data and are interested the communities thoughts on the best way to do this. Your thoughts?

If the bases are matching then potentially you could keep the higher of the two quality values considering positional context of the base in the read.

**Jegar** · 09-23-2014, 05:40 AM

How you combine these scores depends on the platform you are using, as the Phred scores are calculated differently.

If they are Illumina scores, I believe it is appropriate to add the scores together, as they are log transformed scores reflecting the likelihood of the base call being in error so adding them is equivalent to multiplying the likelihood of each call (i.e. the probability of base 1 AND base 2 being in error). This causes very high Phred-like scores in some instances, but from what I have read, this reflects the inaccuracy of Illumina's Phred scores rather than the methodology used to combine.

I am very happy to be corrected on this!

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 24 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 21 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Calculating consensus quality scores

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News