Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Little concordance between GATK and Samtools

    Hello,

    I have read that GATK and Samtools should give you about 90% concordance (Hugeseq test). I am not able to reproduce that and was wondering if anyone has done it and can tell me what parameters they used??

    I am doing:

    java -Xmx4g -jar aln.bam -R ucsc.hg19.fasta --min_base_quality_score 30 -T UnifiedGenotyper -o gatk.vcf

    samtools mpileup -uf ucsc.hg19.fasta --Q 30 -D -g -I -S aln.bam | bcftools view -bvcg - > samtools.bcf
    bcftools view samtools.bcf > samtools.vcf

    Any insight?

    Thanks in advance,
    Ramiro

  • #2
    So what concordance *are* you getting?

    Comment


    • #3
      Well, with vcf-compare I am getting about 30%!

      # The command line was: vcf-compare(r731) chr10gatk.raw.vcf.gz chr10.samtools.raw.vcf.gz
      #
      #VN 'Venn-Diagram Numbers'. Use `grep ^VN | cut -f 2-` to extract this part.
      #VN The columns are:
      #VN 1 .. number of sites unique to this particular combination of files
      #VN 2- .. combination of files and space-separated number, a fraction of sites in the file
      VN 271 chr10.samtools.raw.vcf.gz (35.4%) chr10gatk.raw.vcf.gz (38.2%)
      VN 438 chr10gatk.raw.vcf.gz (61.8%)
      VN 494 chr10.samtools.raw.vcf.gz (64.6%)
      #SN Summary Numbers. Use `grep ^SN | cut -f 2-` to extract this part.
      SN Number of REF matches: 271
      SN Number of ALT matches: 270
      SN Number of REF mismatches: 0
      SN Number of ALT mismatches: 1
      SN Number of samples in GT comparison: 0

      Comment


      • #4
        Originally posted by ramirob View Post
        Well, with vcf-compare I am getting about 30%!

        # The command line was: vcf-compare(r731) chr10gatk.raw.vcf.gz chr10.samtools.raw.vcf.gz
        #
        #VN 'Venn-Diagram Numbers'. Use `grep ^VN | cut -f 2-` to extract this part.
        #VN The columns are:
        #VN 1 .. number of sites unique to this particular combination of files
        #VN 2- .. combination of files and space-separated number, a fraction of sites in the file
        VN 271 chr10.samtools.raw.vcf.gz (35.4%) chr10gatk.raw.vcf.gz (38.2%)
        VN 438 chr10gatk.raw.vcf.gz (61.8%)
        VN 494 chr10.samtools.raw.vcf.gz (64.6%)
        #SN Summary Numbers. Use `grep ^SN | cut -f 2-` to extract this part.
        SN Number of REF matches: 271
        SN Number of ALT matches: 270
        SN Number of REF mismatches: 0
        SN Number of ALT mismatches: 1
        SN Number of samples in GT comparison: 0
        OK I admit that seems quite low, but do you really want to be considering the raw, unfiltered output for your comparisons?

        Comment


        • #5
          Not really, but if I filter the gatk output I would just get less variants wouldn't I? The main question I have is, are the parameters I am using ok? Maybe I can check other alignments. I have been trying to find out what parameters people use that they report 90% concordance, do you know?

          Thank you very much for your attention and help!
          Ramiro

          Comment


          • #6
            Originally posted by ramirob View Post
            Not really, but if I filter the gatk output I would just get less variants wouldn't I? The main question I have is, are the parameters I am using ok? Maybe I can check other alignments. I have been trying to find out what parameters people use that they report 90% concordance, do you know?

            Thank you very much for your attention and help!
            Ramiro
            Yes you would get less variants, but they would also be high quality ones.

            I don't know much about samtools mpileup parameters but assuming you're working on human data, the GATK best practice documents would be a good place to start for refining GATK parameters.



            There's plenty of pipelines on Github and the like with other peoples parameters exposed. I wasn't able to find information on the parameters in the HugeSeq paper, but I note it was for a very old version of GATK.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              Yesterday, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            58 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            54 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            46 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            55 views
            0 likes
            Last Post seqadmin  
            Working...
            X