Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • samtools/mpileup heterozygous SNPs calling

    I'm trying to get heterozygous SNPs from illumina DNA sequencing data.
    I've used samtools/mpileup pipeline and have some questions about the options.

    There are many posts related in the BAQ calculation. As known, the calculation is default and if the options -B is used, more SNPs could be detected (sacrificing the specificity). I've tested with my samples and the difference is almost ~2x or ~3x difference. (-uf vs -Buf, -Euf didn't make a big difference compared -uf)
    Is there anyone with experience with -B -E and -I options?

    Actually, we are not interested in INDELs so I thought that the option -I could be used for ignoring INDELs calling.
    But when I compared the results (for heterozygous SNPs) with -I and without -I, many detected heterozygous SNPs in the results with -I option are actually the INDELs cases in the results without -I option. So is it better to ignore those SNPs, in another word, should I "not" use the -I option?

    I really want to know the appropriate options for calling heterozygous SNPs.

  • #2
    My tiny bit of experience:

    A sanger-verifed herterozygous SNP disappeared when I omitted -B, and reappeared when I put it on. I've had other projects, where I didn't sanger verify, but where multiple related projects all had SNPs in the same gene, which was a highly likely candidate, that were virtually uncallable without the -B option.

    I wouldn't filter out indels, unless you were using an aligner that you know can't handle them.

    Comment


    • #3
      Originally posted by swbarnes2 View Post
      My tiny bit of experience:

      A sanger-verifed herterozygous SNP disappeared when I omitted -B, and reappeared when I put it on. I've had other projects, where I didn't sanger verify, but where multiple related projects all had SNPs in the same gene, which was a highly likely candidate, that were virtually uncallable without the -B option.

      I wouldn't filter out indels, unless you were using an aligner that you know can't handle them.
      Thanks for sharing tips. Actually, the reads were aligned by BWA, so it's better to consider INDELs. Have you used '-E' option (Extended BAQ computation) too?

      Comment


      • #4
        Originally posted by combiochem View Post
        Thanks for sharing tips. Actually, the reads were aligned by BWA, so it's better to consider INDELs. Have you used '-E' option (Extended BAQ computation) too?
        The author of samtools suggested it in a thread where I mentioned my problem with not using -B, but I haven't tried applying to those samples yet.

        For my projects, false positives are not that big a problem. I'm looking for candidate phenotype-causing SNPs most of the time, so sanger-checking a modest number of false positives is not a big deal. But I don't want to miss the real deal because the software was overzealous in trying to help me, so it's safer for me to turn it off entirely.

        But it's good to know that in your tests, -E worked about as well as -B.

        Comment


        • #5
          For my case, using '-B' increased the number of SNP calls by 2 ~ 5 times from Illumina sequencing data.
          Last edited by sijungyun; 08-03-2011, 03:47 PM.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          22 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          24 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          20 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          52 views
          0 likes
          Last Post seqadmin  
          Working...
          X