Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • VCF output from FreeBayes

    Hi,

    I have just started using FreeBayes as a comparison to samtools as they both take BAMs and produce a VCF output which is ideal for scripting. FreeBayes actually seems like the ideal free-standing, simple and fast SNP caller I have been longing for, however the values in the VCF INFO column do not seem to match what is in the alignment.
    In particular the DP value almost never seems to match what I can see in alignment viewers. It is always less than the number of reads I see stacked at the SNP position while VCFv4.1 claims that DP should represent the _total_ read depth.
    Additionally, the AC (Total number of alternate alleles in called genotypes) and AN (Total number of alleles in called genotypes) values seem to almost never vary no matter what the read depth and number of alt alleles at a SNP position. To be more precise AC is always either 1 or 2 and AN seems fixed at 2 no matter what.

    Is this a known quirk of FreeBayes? These are very important values that I need to parse from the VCF output but they just make no sense after having used samtools for a few years.

    Thanks.

  • #2
    I have run Freebayes on some of my old bam alignments, which were already evaluated with samtools, gigabayes and gatk. None of the polymorphisms discovered by Freebayes matched the real ones. Actually, it missed very well defined snps, and pointed instead at some positions, which do not look anything special. DP doesn't correspond to real coverage either (no matter if to consider quality or not). Maybe just a problem with old bam formats?
    Last edited by garwuf; 11-21-2011, 03:37 PM.

    Comment


    • #3
      I have similar issues with the freebayes output.
      I guess that "AC" & "AN" refer to the number of *alleles* rather than sequencing depth (which is why they are always 1 or 2). (??)
      However, I cannot find any documentation of what these or the other fields in the vcf output mean; not all of them are specified here:
      1000genomes.org is your first and best source for all of the information you’re looking for. From general topics to more of what you would expect to find here, 1000genomes.org has it all. We hope you find what you are searching for!


      If there's any way to get an output from freebayes that looks more like the vcf format of samtools mpileup, I'd be interested to know.

      Comment


      • #4
        Originally posted by garwuf View Post
        I have run Freebayes on some of my old bam alignments, which were already evaluated with samtools, gigabayes and gatk. None of the polymorphisms discovered by Freebayes matched the real ones. Actually, it missed very well defined snps, and pointed instead at some positions, which do not look anything special. DP doesn't correspond to real coverage either (no matter if to consider quality or not). Maybe just a problem with old bam formats?
        I'd guess this is a problem with BAM format changes or the specific (older) version which you were using, but I'm not sure. This sounds pretty strange--- we don't see this at all in testing and if we did we certainly couldn't use freebayes as a detector if it behaved like this.

        What happens with this data when you use the most recent version?

        Comment


        • #5
          Originally posted by cigsit View Post
          Hi,

          I have just started using FreeBayes as a comparison to samtools as they both take BAMs and produce a VCF output which is ideal for scripting. FreeBayes actually seems like the ideal free-standing, simple and fast SNP caller I have been longing for, however the values in the VCF INFO column do not seem to match what is in the alignment.
          In particular the DP value almost never seems to match what I can see in alignment viewers. It is always less than the number of reads I see stacked at the SNP position while VCFv4.1 claims that DP should represent the _total_ read depth.
          Additionally, the AC (Total number of alternate alleles in called genotypes) and AN (Total number of alleles in called genotypes) values seem to almost never vary no matter what the read depth and number of alt alleles at a SNP position. To be more precise AC is always either 1 or 2 and AN seems fixed at 2 no matter what.

          Is this a known quirk of FreeBayes? These are very important values that I need to parse from the VCF output but they just make no sense after having used samtools for a few years.

          Thanks.
          I believe I know what's going on here.

          In the old versions, freebayes was designed to behave like gigabayes/bambayes, which used BQ and MQ filters to remove a lot of low-quality reads.

          However, this was ultimately shown to be a bad idea, and so the default filters were removed. I think a lot of users got very poor results when their MQs weren't well-calibrated. (I eventually settled on minimum input filters requiring at least 2 observations comprising 20% of reads in a single individual, which I've just made default in the recent freebayes revisions.)

          So, the point is that the reported DP is the post-filter depth. And, if you were running with a minimum mapping quality filter of 30 and a minimum base quality filter of 20, the DP might get quite low.

          If you run with the most recent version, you should get results that are more sensible.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          30 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          32 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          28 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          53 views
          0 likes
          Last Post seqadmin  
          Working...
          X