SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to filter FASTQ Quality? weasteam Bioinformatics 4 04-24-2012 10:19 AM
GATK base quality recalibration suppose to keep old and new quality scores? Heisman Bioinformatics 2 10-21-2011 07:40 AM
samtools mpileup filter SNPs Hit Bioinformatics 3 05-25-2011 04:55 PM
Does the TopHat illumina quality option work? PFS Bioinformatics 0 04-13-2011 05:18 AM
Different MAPPING QUALITY/PER-BASE QUALITY SCORE m_elena_bioinfo Bioinformatics 2 09-02-2010 09:00 AM

Reply
 
Thread Tools
Old 12-29-2011, 10:35 AM   #1
david.tamborero
Member
 
Location: spain

Join Date: Feb 2011
Posts: 60
Default mpileup base-quality filter does not seem to work

Hello,

I am trying to detect somatic mutations on tumor-normal samples (illumina paired-end reads), so what I am doing as the first approach is the following:

Code:
- bfast alignement (match + localalign + postprocess) for each sample
- picard remove_duplicates for each sample
- samtools mpileup for each sample
- varscan for each tumor-normal samples pair
I am interested in not taking into account those reads/bases with 'low' quality for the mpileup step, thus I use the -q/-Q arguments to do so. However, it does not seem to work, and after diving through the data now I am totally confused.

I check out the bam file by using IGV, which annotates the base/read qualities for each position. The pileup file is generated by the following mpileup command:

Code:
/samtools-0.1.18/samtools mpileup -f ref.fa -B -q 1 -Q 30 -SD sample.bam > sample.pileup
What I observe is that the number of reads that are included in the pileup summary are less than the ones availaible in the bam file. But the point is that they do not seem to respond to the -q1 -Q 30 criteria (for instance, it includes read bases whose quality is much lower than 30, according to the bam file). Note that I disabled the BAQ calculation to do everything more clear. Moreover, the base qualities reported in most of the pileup entries are sistematically lower than 30, e.g:

Code:
chr1	115323009	T	18	,$.,,,,,,,,,,c,,,,,	[email protected];>?=77##;#>>;
And even more confusing for me, when I run Varscan, which is supposed to just summarize the pileup data, the reported number of reads supporting each allele does not fit with the corresponding pileup entry. For instance, for the position of the previous example, Varscan says that only eigth reads supports the 'T' allele.

I've found many entries about to use/or not the BAQ calculations, but I have no clue about problems with the -q/-Q criteria, or even the Varscan statistics. It should be trivial, so I guess I am missing some silly thing, but any help would be really appreciated.

thanks a lot!
david
david.tamborero is offline   Reply With Quote
Reply

Tags
mpileup, quality scores, varscan

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:10 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO