Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Samtools pipeline produces many indels

    Hi All,
    I'm new to the community and just beginning to understand many of the programs used for SNP selection. In my current exome seq project, I am looking for homozygous SNPs in a consanguineous pedigree with multiple affected siblings. My pipeline is as follows:

    previously aligned input.bam files were obtained from sequencing facility

    $ samtools sort input.bam input.sorted.bam

    $ samtools rmdup –s input.sorted.bam input.sorted.rmdup.bam

    $ samtools index input.sorted.rmdup.bam

    $ samtools faidx HG19.fa

    $ samtools mpileup –uf HG19.fa input.sorted.rmdup.bam > variants.raw

    $ bcftools view –bvcg variants.raw > variants.raw.bcf

    $ bcftools view variants.raw.bcf | vcfutils.pl varFilter –d 3 –D 1000 –G 20 > variants.flt.vcf

    Afer this, I discard common variants using the 1000 genomes and/or dbSNP
    I then grep for homozygous variants, variants shared among affected individuals and found in the heterozygous state in the unaffected parents.

    This protocol seems to be generating a large number of INDELs (>50%) compared to SNPs. Is this unusual? Should I have more stringent filters in place?

    Also, once I have obtained a final list of variants, what programs are recommended currently for functional analysis?

  • #2
    Try

    samtools mpileup –Buf HG19.fa input.sorted.rmdup.bam > variants.raw

    In my experience, the BAQ calculations eat SNPs. -B will turn those calculations off.

    If you compare the two pileups with and without the BAQ, what you may see is that the BAQ calculations are drastically dropping the quality scores of real SNPs, causing the SNP caller to ignore them, due to low quality. I believe this is an attempt to reduce false positives due to indels.

    Comment


    • #3
      Finding indels is much harder, no one so far really understands that. Samtools is calling many false indels, and the existing database is lacking many true indels.

      As to BAQ, if you really care about sensitivity, disable it. For general purposes, use it. BAQ trades a couple percent sensitivity for hugely improved specificity.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin


        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
        Yesterday, 07:01 AM
      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      55 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      52 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      45 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      55 views
      0 likes
      Last Post seqadmin  
      Working...
      X