Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • mpileup base-quality filter does not seem to work

    Hello,

    I am trying to detect somatic mutations on tumor-normal samples (illumina paired-end reads), so what I am doing as the first approach is the following:

    Code:
    - bfast alignement (match + localalign + postprocess) for each sample
    - picard remove_duplicates for each sample
    - samtools mpileup for each sample
    - varscan for each tumor-normal samples pair
    I am interested in not taking into account those reads/bases with 'low' quality for the mpileup step, thus I use the -q/-Q arguments to do so. However, it does not seem to work, and after diving through the data now I am totally confused.

    I check out the bam file by using IGV, which annotates the base/read qualities for each position. The pileup file is generated by the following mpileup command:

    Code:
    /samtools-0.1.18/samtools mpileup -f ref.fa -B -q 1 -Q 30 -SD sample.bam > sample.pileup
    What I observe is that the number of reads that are included in the pileup summary are less than the ones availaible in the bam file. But the point is that they do not seem to respond to the -q1 -Q 30 criteria (for instance, it includes read bases whose quality is much lower than 30, according to the bam file). Note that I disabled the BAQ calculation to do everything more clear. Moreover, the base qualities reported in most of the pileup entries are sistematically lower than 30, e.g:

    Code:
    chr1	115323009	T	18	,$.,,,,,,,,,,c,,,,,	==?=@;>?=77##;#>>;
    And even more confusing for me, when I run Varscan, which is supposed to just summarize the pileup data, the reported number of reads supporting each allele does not fit with the corresponding pileup entry. For instance, for the position of the previous example, Varscan says that only eigth reads supports the 'T' allele.

    I've found many entries about to use/or not the BAQ calculations, but I have no clue about problems with the -q/-Q criteria, or even the Varscan statistics. It should be trivial, so I guess I am missing some silly thing, but any help would be really appreciated.

    thanks a lot!
    david

Latest Articles

Collapse

  • seqadmin
    Strategies for Sequencing Challenging Samples
    by seqadmin


    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
    03-22-2024, 06:39 AM
  • seqadmin
    Techniques and Challenges in Conservation Genomics
    by seqadmin



    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

    Avian Conservation
    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
    03-08-2024, 10:41 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 06:37 PM
0 responses
11 views
0 likes
Last Post seqadmin  
Started by seqadmin, Yesterday, 06:07 PM
0 responses
10 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-22-2024, 10:03 AM
0 responses
51 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-21-2024, 07:32 AM
0 responses
68 views
0 likes
Last Post seqadmin  
Working...
X