Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Samtools SNP/Indel calling discards many reads

    Hello,

    I am trying to find indels using Bowtie2 and samtools from some illumina runs of a few bacterial genomes.

    The sequenced samples are from a directed evolution study, so I would expect the SNPs and indels from the starting sample to be propagated through.
    In about 4 of the 16 sequenced samples, I have a indel called at one location (including the starting sample), but the others call this as a SNP. If I look at the vcf file, I see that it is using only a small fractions of the reads to call the SNP (ie, DP=200-400 and the sum of the high quality reads DP4 ~ 20-40). Also, in tablet it looks like there should be an indel there.

    Has anyone run into the problem where samtools discards many of the reads due to "low quality"?


    Side notes: I've actually recently started to use the IndelRealigner from GATK and can get these locations to be called as indels, but still many of the reads are not being used.
    I should also mention that one half of the samples (those that had some problems) were with the phred64+ and not the phred33.

    Thanks for your help!

  • #2
    Variant calling will generally use two quality scores: the per-base quality scores and a mapping quality score that indicates how certain the aligner was that the read belongs at that position in the genome.

    Is this a repetitive locus or is the sequence similar to other genomic locations? If so, you may have reads being mapped there that actually belong elsewhere. Or perhaps the aligner and variant caller are being too conservative. I've seen all kinds of hard-to-interpret variant calling output when you're actually analyzing reads from multiple sites.

    Comment


    • #3
      How big an indel? With single end data, indels larger than 2 or 3 bases won't be called as such, the software will at best align some reads with the indel just at the edge, and not realize that it's an indel, but will just call it a SNP.

      Comment


      • #4
        Using a combination of PHRED+64 and PHRED+33 data can't help. It's easy to convert them; the forum describes a variety of scripts/software for this application.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Today, 11:49 AM
        0 responses
        13 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 08:47 AM
        0 responses
        16 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        61 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        60 views
        0 likes
        Last Post seqadmin  
        Working...
        X