SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   deletion '*' has quality score in pileup? (http://seqanswers.com/forums/showthread.php?t=5431)

luxmare 06-08-2010 06:50 PM

deletion '*' has quality score in pileup?
 
I know that asterisks '*' in read bases column represent deletions as place holders in pileup format. But in my data, deletion '*' seems to has base quality score. why deleted bases have base qualities?

In this example, there are 6 bases and 1 deletion at the site 25, and the number of base qualities is 7. Can I simply ignore the base quality 'a' corresponding to the deleted base?

chr1 24 g 7 ,..,-1t.,, ``a]bb\
chr1 25 t 7 ,..*.,, b[baaa`
chr1 26 t 7 ,..,.,, a_baa]_

Thanks in advance.

FYI:

I constructed pileup file by using the following command

$ ./samtools-0.1.7a/samtools pileup -f reference.fa <in.bam> > <out.pileup>

epigen 06-09-2010 09:19 AM

The base quality is also a place holder just as the * in the reads column.

I found similar occurrences in my data (created with samtools pileup -vcf), e.g.

chr14 65392038 T A 0 0 0 1 * W

Note that the weird consensus A must result from a floating underflow in the MAQ SNP calling model (http://sourceforge.net/apps/mediawik...?title=SAM_FAQ). Although the FAQ answer says that this only happens in repetitive regions, the reference T is upper case and I just checked again in the UCSC Genome Browser for hg19 that there is no repeat. So I guess the reason is that there is only one read.

Torst 06-11-2010 06:34 PM

Quote:

Originally Posted by luxmare (Post 19893)
I know that asterisks '*' in read bases column represent deletions as place holders in pileup format. But in my data, deletion '*' seems to has base quality score. why deleted bases have base qualities?

I don't know how samtools handles this particularly, but the IDEA of giving a deletion a "quality score" is good. A quality score is essentially just a logarithm of a "probability of error" or "probability that this is wrong". So you can interpret it as a statistical confidence of the deletion being real.

luxmare 06-13-2010 04:20 PM

Hi Torst,
Thank you for your reply. I agree with you. Quality scores of deletions is informative. But if the pileup format (made by BWA->SAMtools) was designed based on the IDEA you said, inserted bases should have base quality scores. Actually, however, inserted bases don't have qualities as below. I'm confusing...:confused:


chr3 7759 C 4 ..,, SO]^
chr3 7760 C 4 .+1G.,, \\\[ <-- 4 ref-type bases and 1 insertion, but only 4 qualities
chr3 7761 G 4 ..,, GT[a


All times are GMT -8. The time now is 05:59 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.