Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • more samtools SNP calling questions

    I have used the samtools/bcftools/vcfutils pipeline to do variant calling.
    step1:
    samtools mpileup -uf reference.fa align-file1.bam align-file2.bam | bcftools view -bvcg >unfiltered-variants.bcf
    step2:
    bcftools view unfiltered-variants.bcf | vcfutils.pl varFilter -D100 >filtered-variants.vcf

    This gives me about 30 variants in a format I can open in a spreadsheet.

    However, I would also like to see the unfiltered variant list, so it needs to be in vcf format. I should be able to do this by changing the first step bcftools arguments to -vcg and the output file name to *.vcf. Actually this works up to certain point and I get partial output which matches the filtered variants, but it crashes at a certain line and I get sync error the number of fields does not match.
    is there another way to just convert the bcf file into vcf to see all variants?

    my other concern is that just about ALL variants found are indels. I have used IGV to look at the aligned contigs and I clearly see a number of SNPs.
    I am trying to decipher the arguments it seems to me that the -c switch in bcftools should list SNPs?
    I have seen a number of pages and read other posts, some of which are similar but just not quite the answer I need.

  • #2
    1) It looks like you are using the pre-version 1 of samtools/bcftools. I don't blame you since samtools v.1 changed a bunch. I am still getting use to it. That said if you want to see a lot of SNPs using the old method then you could use the 'flat' prior via the switch '-P flat'

    2) GATK give you better SNP calls.

    Comment


    • #3
      Thanks Rick. I have no training and I do not use any of these assorted tools often, so it is always new for me. I also dread the installation process of a program/tool more than anything! I guess the new version came out between the last time I was fiddling with this and now. I will see if I can find/work with GATK.
      Not sure what you mean by 'flat'?

      Comment


      • #4
        Since I am not a statistician I would stumble trying to explain a 'prior' so I refer you to Wikipedia:



        Also quoting from Heng Li on the Samtools list from 2010:

        With any Bayesian SNP callers, you need a prior. Full is the standard Wright-Fisher infinite site prior. Flat is a flat prior and cond2 is the prior distribution conditional on hets discovered in two chromosomes, assuming Wright-Fisher.
        As mentioned above the old bcftools had three different set priors. The new bcftools using the 'call' option has a more variable set of priors as per:

        -P, --prior <float> mutation rate (use bigger for greater sensitivity)

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin


          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
          Yesterday, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        39 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        41 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        35 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        55 views
        0 likes
        Last Post seqadmin  
        Working...
        X