![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
filtering the variants | kjaja | Bioinformatics | 1 | 08-07-2013 02:38 AM |
sequence filtering | yaximik | Bioinformatics | 2 | 03-13-2012 05:10 AM |
mpileup results using -q | Hit | Bioinformatics | 0 | 03-31-2011 08:18 PM |
Can pre-filtering reads affect your analysis results? | PFS | Bioinformatics | 0 | 03-24-2011 11:08 AM |
mpileup and pileup results | rururara | Bioinformatics | 1 | 03-23-2011 05:05 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: United States, TN Join Date: Jul 2010
Posts: 15
|
![]()
Does anyone have any suggestions on filtering the mpileup results? We have obtained SNPs/indels according to the command lines on the mpileup website:
samtools mpileup -uf ref.fa aln1.bam aln2.bam | bcftools view -bvcg - > var.raw.bcf (1) bcftools view var.raw.bcf | vcfutils.pl varFilter -D100 > var.flt.vcf (2) As in pileup where there is "awk '($3=="*"&&$6>=50)||($3!="*"&&$6>=20)' sample1.flt.txt > sample1.final.txt" suggested to filter the results, are there similar rules necessary to filter mpileup results? For example, is the 6th column (named QUAL) in the vcf file from (2) the same as the 6th column in the pileup file, and is it appropriate to apply "$6>20" for the vcf file too? Any suggestion is appreciated. |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: San Diego Join Date: May 2008
Posts: 912
|
![]()
If you only have one sample in your vcf, filtering on depth might be more useful. You can use the DP value, but the DP4 values will omit poor quality reads, which probably makes it more useful. But it's a trickier awk statement.
If you have multiple sample, QUAL won't help you figure out which samples really have the SNP, and which don't. The best you can do is to use the GQ quality, and unfortunately, you won't have depths of coverage for individual samples. |
![]() |
![]() |
![]() |
Thread Tools | |
|
|