SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
454 quality score, z-score,.. nii 454 Pyrosequencing 4 10-15-2020 07:29 AM
Threshold quality score to determine the quality read of ILLUMINA reads problem edge Illumina/Solexa 35 11-02-2015 11:31 AM
SNP quality score in Samtools pileup wangzkai Bioinformatics 5 09-21-2011 07:32 AM
Samtools pileup indel quality score computation christophpale Bioinformatics 0 07-27-2010 01:51 PM
Fastq quliaty score and MAQ output quality score baohua100 Bioinformatics 1 02-19-2009 10:21 AM

Reply
 
Thread Tools
Old 06-08-2010, 06:50 PM   #1
luxmare
Member
 
Location: Japan

Join Date: Feb 2009
Posts: 10
Unhappy deletion '*' has quality score in pileup?

I know that asterisks '*' in read bases column represent deletions as place holders in pileup format. But in my data, deletion '*' seems to has base quality score. why deleted bases have base qualities?

In this example, there are 6 bases and 1 deletion at the site 25, and the number of base qualities is 7. Can I simply ignore the base quality 'a' corresponding to the deleted base?

chr1 24 g 7 ,..,-1t.,, ``a]bb\
chr1 25 t 7 ,..*.,, b[baaa`
chr1 26 t 7 ,..,.,, a_baa]_

Thanks in advance.

FYI:

I constructed pileup file by using the following command

$ ./samtools-0.1.7a/samtools pileup -f reference.fa <in.bam> > <out.pileup>
luxmare is offline   Reply With Quote
Old 06-09-2010, 09:19 AM   #2
epigen
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 101
Default

The base quality is also a place holder just as the * in the reads column.

I found similar occurrences in my data (created with samtools pileup -vcf), e.g.

chr14 65392038 T A 0 0 0 1 * W

Note that the weird consensus A must result from a floating underflow in the MAQ SNP calling model (http://sourceforge.net/apps/mediawik...?title=SAM_FAQ). Although the FAQ answer says that this only happens in repetitive regions, the reference T is upper case and I just checked again in the UCSC Genome Browser for hg19 that there is no repeat. So I guess the reason is that there is only one read.
epigen is offline   Reply With Quote
Old 06-11-2010, 06:34 PM   #3
Torst
Senior Member
 
Location: The University of Melbourne, AUSTRALIA

Join Date: Apr 2008
Posts: 275
Default

Quote:
Originally Posted by luxmare View Post
I know that asterisks '*' in read bases column represent deletions as place holders in pileup format. But in my data, deletion '*' seems to has base quality score. why deleted bases have base qualities?
I don't know how samtools handles this particularly, but the IDEA of giving a deletion a "quality score" is good. A quality score is essentially just a logarithm of a "probability of error" or "probability that this is wrong". So you can interpret it as a statistical confidence of the deletion being real.
Torst is offline   Reply With Quote
Old 06-13-2010, 04:20 PM   #4
luxmare
Member
 
Location: Japan

Join Date: Feb 2009
Posts: 10
Unhappy

Hi Torst,
Thank you for your reply. I agree with you. Quality scores of deletions is informative. But if the pileup format (made by BWA->SAMtools) was designed based on the IDEA you said, inserted bases should have base quality scores. Actually, however, inserted bases don't have qualities as below. I'm confusing...


chr3 7759 C 4 ..,, SO]^
chr3 7760 C 4 .+1G.,, \\\[ <-- 4 ref-type bases and 1 insertion, but only 4 qualities
chr3 7761 G 4 ..,, GT[a
luxmare is offline   Reply With Quote
Reply

Tags
deletion, pileup, quality, samtools

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:37 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO