Hello,
I am trying to detect somatic mutations on tumor-normal samples (illumina paired-end reads), so what I am doing as the first approach is the following:
I am interested in not taking into account those reads/bases with 'low' quality for the mpileup step, thus I use the -q/-Q arguments to do so. However, it does not seem to work, and after diving through the data now I am totally confused.
I check out the bam file by using IGV, which annotates the base/read qualities for each position. The pileup file is generated by the following mpileup command:
What I observe is that the number of reads that are included in the pileup summary are less than the ones availaible in the bam file. But the point is that they do not seem to respond to the -q1 -Q 30 criteria (for instance, it includes read bases whose quality is much lower than 30, according to the bam file). Note that I disabled the BAQ calculation to do everything more clear. Moreover, the base qualities reported in most of the pileup entries are sistematically lower than 30, e.g:
And even more confusing for me, when I run Varscan, which is supposed to just summarize the pileup data, the reported number of reads supporting each allele does not fit with the corresponding pileup entry. For instance, for the position of the previous example, Varscan says that only eigth reads supports the 'T' allele.
I've found many entries about to use/or not the BAQ calculations, but I have no clue about problems with the -q/-Q criteria, or even the Varscan statistics. It should be trivial, so I guess I am missing some silly thing, but any help would be really appreciated.
thanks a lot!
david
I am trying to detect somatic mutations on tumor-normal samples (illumina paired-end reads), so what I am doing as the first approach is the following:
Code:
- bfast alignement (match + localalign + postprocess) for each sample - picard remove_duplicates for each sample - samtools mpileup for each sample - varscan for each tumor-normal samples pair
I check out the bam file by using IGV, which annotates the base/read qualities for each position. The pileup file is generated by the following mpileup command:
Code:
/samtools-0.1.18/samtools mpileup -f ref.fa -B -q 1 -Q 30 -SD sample.bam > sample.pileup
Code:
chr1 115323009 T 18 ,$.,,,,,,,,,,c,,,,, ==?=@;>?=77##;#>>;
I've found many entries about to use/or not the BAQ calculations, but I have no clue about problems with the -q/-Q criteria, or even the Varscan statistics. It should be trivial, so I guess I am missing some silly thing, but any help would be really appreciated.
thanks a lot!
david