Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Little concordance between GATK and Samtools

    Hello,

    I have read that GATK and Samtools should give you about 90% concordance (Hugeseq test). I am not able to reproduce that and was wondering if anyone has done it and can tell me what parameters they used??

    I am doing:

    java -Xmx4g -jar aln.bam -R ucsc.hg19.fasta --min_base_quality_score 30 -T UnifiedGenotyper -o gatk.vcf

    samtools mpileup -uf ucsc.hg19.fasta --Q 30 -D -g -I -S aln.bam | bcftools view -bvcg - > samtools.bcf
    bcftools view samtools.bcf > samtools.vcf

    Any insight?

    Thanks in advance,
    Ramiro

  • #2
    So what concordance *are* you getting?

    Comment


    • #3
      Well, with vcf-compare I am getting about 30%!

      # The command line was: vcf-compare(r731) chr10gatk.raw.vcf.gz chr10.samtools.raw.vcf.gz
      #
      #VN 'Venn-Diagram Numbers'. Use `grep ^VN | cut -f 2-` to extract this part.
      #VN The columns are:
      #VN 1 .. number of sites unique to this particular combination of files
      #VN 2- .. combination of files and space-separated number, a fraction of sites in the file
      VN 271 chr10.samtools.raw.vcf.gz (35.4%) chr10gatk.raw.vcf.gz (38.2%)
      VN 438 chr10gatk.raw.vcf.gz (61.8%)
      VN 494 chr10.samtools.raw.vcf.gz (64.6%)
      #SN Summary Numbers. Use `grep ^SN | cut -f 2-` to extract this part.
      SN Number of REF matches: 271
      SN Number of ALT matches: 270
      SN Number of REF mismatches: 0
      SN Number of ALT mismatches: 1
      SN Number of samples in GT comparison: 0

      Comment


      • #4
        Originally posted by ramirob View Post
        Well, with vcf-compare I am getting about 30%!

        # The command line was: vcf-compare(r731) chr10gatk.raw.vcf.gz chr10.samtools.raw.vcf.gz
        #
        #VN 'Venn-Diagram Numbers'. Use `grep ^VN | cut -f 2-` to extract this part.
        #VN The columns are:
        #VN 1 .. number of sites unique to this particular combination of files
        #VN 2- .. combination of files and space-separated number, a fraction of sites in the file
        VN 271 chr10.samtools.raw.vcf.gz (35.4%) chr10gatk.raw.vcf.gz (38.2%)
        VN 438 chr10gatk.raw.vcf.gz (61.8%)
        VN 494 chr10.samtools.raw.vcf.gz (64.6%)
        #SN Summary Numbers. Use `grep ^SN | cut -f 2-` to extract this part.
        SN Number of REF matches: 271
        SN Number of ALT matches: 270
        SN Number of REF mismatches: 0
        SN Number of ALT mismatches: 1
        SN Number of samples in GT comparison: 0
        OK I admit that seems quite low, but do you really want to be considering the raw, unfiltered output for your comparisons?

        Comment


        • #5
          Not really, but if I filter the gatk output I would just get less variants wouldn't I? The main question I have is, are the parameters I am using ok? Maybe I can check other alignments. I have been trying to find out what parameters people use that they report 90% concordance, do you know?

          Thank you very much for your attention and help!
          Ramiro

          Comment


          • #6
            Originally posted by ramirob View Post
            Not really, but if I filter the gatk output I would just get less variants wouldn't I? The main question I have is, are the parameters I am using ok? Maybe I can check other alignments. I have been trying to find out what parameters people use that they report 90% concordance, do you know?

            Thank you very much for your attention and help!
            Ramiro
            Yes you would get less variants, but they would also be high quality ones.

            I don't know much about samtools mpileup parameters but assuming you're working on human data, the GATK best practice documents would be a good place to start for refining GATK parameters.



            There's plenty of pipelines on Github and the like with other peoples parameters exposed. I wasn't able to find information on the parameters in the HugeSeq paper, but I note it was for a very old version of GATK.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            25 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            27 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            24 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            52 views
            0 likes
            Last Post seqadmin  
            Working...
            X