Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SNV calling using GATK with data from multiple lanes

    Hi,

    I am using exome sequencing data to call SNVs with unifiedgenotyper of GATK. I have two lanes for each sample. So I merged two bam files into one with two read groups. But in the VCF file, I got two columns like GT:ADP:GQ:PL 0/1:20,3:23:56:56,0,576 0/1:23,9:32:99:153,0,676.
    My questions are
    (1) whether GATK treated these two as two samples because there are two read groups?
    (2) does GATK called SNVs in these two lanes separately or merge the reads of them?
    (3) when I calculate the minor allele frequency, shall I use both columns of GT:ADP:GQ:PL?

    Eager to know the answer.
    Thank you in advance.

  • #2
    In regards to your first question, I do think GATK UnifiedGenotyper would have treated each different read group as a different sample (http://gatkforums.broadinstitute.org...bout-bam-files).

    Is there a particular reason you're using the UnifiedGenotyper? HaplotypeCaller is it's successor (http://www.broadinstitute.org/gatk/g...-discovery-ovw).

    Comment


    • #3
      Hi N311V,

      Thank you very much. If they treat different read groups as different samples, then the read groups of each lane are supposed to be the same, right? But this is not mentioned at all in GATK website.

      I just called SNPs not indels. So unified genotyper seems to be faster. Did HaplotyperCaller run better than Unified Genotyper in your project?

      Comment


      • #4
        From the GATK web page:

        The HaplotypeCaller is a more recent and sophisticated tool than the UnifiedGenotyper. Its ability to call SNPs is equivalent to that of the UnifiedGenotyper, and its ability to call indels is far superior. We recommend using HaplotypeCaller in all cases, with only a few exceptions:

        If you want to analyze more than 100 samples at a time (for performance reasons)
        If you are working with non-diploid organisms (UG can handle different levels of ploidy while HC cannot)
        If you are working with pooled samples (also due to the HC’s limitation regarding ploidy)
        In those cases, we recommend using UnifiedGenotyper instead of HaplotypeCaller.
        Personally I am not sure which is better. Getting different results bioinformatically is not a proof of correctness.

        Comment


        • #5
          Originally posted by N311V View Post
          In regards to your first question, I do think GATK UnifiedGenotyper would have treated each different read group as a different sample (http://gatkforums.broadinstitute.org...bout-bam-files).
          If you look at the desc of the SM tag in that page, its seems GATK would treat all read groups with the same SM as coming from the same sample

          GATK tools treat all read groups with the same SM value as containing sequencing data for the same sample. Therefore it's critical that the SM field be correctly specified, especially when using multi-sample tools like the Unified Genotyper.

          Comment


          • #6
            Originally posted by Jolin View Post
            If they treat different read groups as different samples, then the read groups of each lane are supposed to be the same, right? But this is not mentioned at all in GATK website.
            I did read somewhere on the GATK website that each sample needs a unique read group, sorry don't have a link right now. To keep track of lane perhaps you could use picard tools AddOrReplaceReadGroups.jar and specify the library name as the lane.

            Originally posted by Jolin View Post
            I just called SNPs not indels. So unified genotyper seems to be faster. Did HaplotyperCaller run better than Unified Genotyper in your project?
            I was interested in SNPs and indels which made HaplotypeCaller an great all-in-one solution. Also, I was only interested in a couple of genes so speed was not a concern. I haven't compared the SNP results from HaplotypeCaller to UnifiedGenotyper so can't say if they're the same. I assume so but better check.

            Comment


            • #7
              Hi Westerman, Thank you. Actually our lab used Unified Genotyper all the time and did some PCR validation on the predicted SNVs. It seems that UG works well in SNV detection.

              Comment


              • #8
                Thanks a lot, N311V

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Essential Discoveries and Tools in Epitranscriptomics
                  by seqadmin


                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                  Yesterday, 07:01 AM
                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                54 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                50 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                44 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                55 views
                0 likes
                Last Post seqadmin  
                Working...
                X