Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • cigsit
    Junior Member
    • Nov 2011
    • 1

    VCF output from FreeBayes

    Hi,

    I have just started using FreeBayes as a comparison to samtools as they both take BAMs and produce a VCF output which is ideal for scripting. FreeBayes actually seems like the ideal free-standing, simple and fast SNP caller I have been longing for, however the values in the VCF INFO column do not seem to match what is in the alignment.
    In particular the DP value almost never seems to match what I can see in alignment viewers. It is always less than the number of reads I see stacked at the SNP position while VCFv4.1 claims that DP should represent the _total_ read depth.
    Additionally, the AC (Total number of alternate alleles in called genotypes) and AN (Total number of alleles in called genotypes) values seem to almost never vary no matter what the read depth and number of alt alleles at a SNP position. To be more precise AC is always either 1 or 2 and AN seems fixed at 2 no matter what.

    Is this a known quirk of FreeBayes? These are very important values that I need to parse from the VCF output but they just make no sense after having used samtools for a few years.

    Thanks.
  • garwuf
    Junior Member
    • Mar 2009
    • 7

    #2
    I have run Freebayes on some of my old bam alignments, which were already evaluated with samtools, gigabayes and gatk. None of the polymorphisms discovered by Freebayes matched the real ones. Actually, it missed very well defined snps, and pointed instead at some positions, which do not look anything special. DP doesn't correspond to real coverage either (no matter if to consider quality or not). Maybe just a problem with old bam formats?
    Last edited by garwuf; 11-21-2011, 03:37 PM.

    Comment

    • VeBeKay
      Junior Member
      • Jul 2010
      • 5

      #3
      I have similar issues with the freebayes output.
      I guess that "AC" & "AN" refer to the number of *alleles* rather than sequencing depth (which is why they are always 1 or 2). (??)
      However, I cannot find any documentation of what these or the other fields in the vcf output mean; not all of them are specified here:
      1000genomes.org is your first and best source for all of the information you’re looking for. From general topics to more of what you would expect to find here, 1000genomes.org has it all. We hope you find what you are searching for!


      If there's any way to get an output from freebayes that looks more like the vcf format of samtools mpileup, I'd be interested to know.

      Comment

      • ekg
        Member
        • Apr 2010
        • 36

        #4
        Originally posted by garwuf View Post
        I have run Freebayes on some of my old bam alignments, which were already evaluated with samtools, gigabayes and gatk. None of the polymorphisms discovered by Freebayes matched the real ones. Actually, it missed very well defined snps, and pointed instead at some positions, which do not look anything special. DP doesn't correspond to real coverage either (no matter if to consider quality or not). Maybe just a problem with old bam formats?
        I'd guess this is a problem with BAM format changes or the specific (older) version which you were using, but I'm not sure. This sounds pretty strange--- we don't see this at all in testing and if we did we certainly couldn't use freebayes as a detector if it behaved like this.

        What happens with this data when you use the most recent version?

        Comment

        • ekg
          Member
          • Apr 2010
          • 36

          #5
          Originally posted by cigsit View Post
          Hi,

          I have just started using FreeBayes as a comparison to samtools as they both take BAMs and produce a VCF output which is ideal for scripting. FreeBayes actually seems like the ideal free-standing, simple and fast SNP caller I have been longing for, however the values in the VCF INFO column do not seem to match what is in the alignment.
          In particular the DP value almost never seems to match what I can see in alignment viewers. It is always less than the number of reads I see stacked at the SNP position while VCFv4.1 claims that DP should represent the _total_ read depth.
          Additionally, the AC (Total number of alternate alleles in called genotypes) and AN (Total number of alleles in called genotypes) values seem to almost never vary no matter what the read depth and number of alt alleles at a SNP position. To be more precise AC is always either 1 or 2 and AN seems fixed at 2 no matter what.

          Is this a known quirk of FreeBayes? These are very important values that I need to parse from the VCF output but they just make no sense after having used samtools for a few years.

          Thanks.
          I believe I know what's going on here.

          In the old versions, freebayes was designed to behave like gigabayes/bambayes, which used BQ and MQ filters to remove a lot of low-quality reads.

          However, this was ultimately shown to be a bad idea, and so the default filters were removed. I think a lot of users got very poor results when their MQs weren't well-calibrated. (I eventually settled on minimum input filters requiring at least 2 observations comprising 20% of reads in a single individual, which I've just made default in the recent freebayes revisions.)

          So, the point is that the reported DP is the post-filter depth. And, if you were running with a minimum mapping quality filter of 30 and a minimum base quality filter of 20, the DP might get quite low.

          If you run with the most recent version, you should get results that are more sensible.

          Comment

          Latest Articles

          Collapse

          • SEQadmin2
            From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
            by SEQadmin2


            Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


            The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
            ...
            06-02-2026, 10:05 AM
          • SEQadmin2
            Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
            by SEQadmin2


            With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


            Introduction

            Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
            05-22-2026, 06:42 AM
          • SEQadmin2
            Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
            by SEQadmin2

            Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


            Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
            05-06-2026, 09:04 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by SEQadmin2, Today, 08:59 AM
          0 responses
          10 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-02-2026, 12:03 PM
          0 responses
          21 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-02-2026, 11:40 AM
          0 responses
          17 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 05-28-2026, 11:40 AM
          0 responses
          31 views
          0 reactions
          Last Post SEQadmin2  
          Working...