Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Question for GATK experts.....

    I called SNPs using GATK. I have a question.

    If i call SNPs using a single bam file and i get a set of SNPs (850 SNPs).

    Now i called SNPs from multiple bam alignments (not merged bam files, but listing them in consecutive order to get a single VCF file), i get more SNPs for the same sample (2598 SNPs). How is this possible? What am i going wrong? I am using the same filter conditions etc.

  • #2
    What is probably happening is that the 2600 minus 850 SNPs in your sample (call it sample #1) that are only called in the multi-sample SNP calling run are SNPs that didn't have enough evidence to be called as SNPs in sample #1 alone, but that did show evidence of being SNPs in other samples. Seeing the site as a SNP in other samples affects the probability that it is called as a SNP in sample #1.

    Intuitively, the situation is as follows: If we run SNP calling on sample #1 alone and see a site that has a modest amount of evidence that it is a SNP, it will probably not pass the filtering thresholds. If we run SNP calling on a bunch of samples and see that the same site has strong evidence of being a SNP in a different sample we can be more confident that the site is truly a SNP in sample #1. This is I believe one of the main advantages of multi-sample calling.

    Comment


    • #3
      Thanks d17. I understand as you say the depth of coverage increases when you use multi-sample for a location, thus increasing the number of SNPS in the resulting VCF file. Now the question is which one is correct (single sample or multi sample (I know both are correct, but what would one use for ASE)?

      Comment


      • #4
        Hi there,
        I am using GATK to call SNPs from my sam files (from 454 data).
        I am using he following pipeline::
        SAM to BAM
        samtools import BRCA1_coding.fasta out1FR_bwasw.sam new_out1FR_bwasw.bam

        Sort BAM
        samtools sort new_out1FR_bwasw.bam new_out1FR_bwasw.sorted

        Index BAM
        samtools index new_out1FR_bwasw.sorted.bam new_out1FR_bwasw.sorted.bam.bai

        Identify target regions for realignment
        java -jar ~/bin/GenomeAnalysisTK-1.0.5777/GenomeAnalysisTK.jar -T RealignerTargetCreator -R BRCA1_coding.fasta -I new_out1FR_bwasw.sorted.bam -o new_out1FR_bwasw.intervals
        And I get a an interval file that has 4 locations which is a subset of the regions i identified using tablet viewer.
        Following note from command line.
        14:53:16,100 TraversalEngine - 0 reads were filtered out during traversal out of 565 total (0.00%)
        Then,
        Realign BAM to get better Indel calling
        java -jar ~/bin/GenomeAnalysisTK-1.0.5777/GenomeAnalysisTK.jar -T IndelRealigner -R BRCA1_coding.fasta -I new_out1FR_bwasw.sorted.bam -targetIntervals new_out1FR_bwasw.intervals -o new_out1FR_bwasw.sorted.realigned.bam
        Add or Replace read group
        java -jar ~/bin/picard-tools-1.45/AddOrReplaceReadGroups.jar I= new_out1FR_bwasw.sorted.realigned.bam O= new_out1FR_bwasw_new.sorted.realigned.bam SORT_ORDER=coordinate RGID=foo RGLB=bar RGPL=illumina RGSM=DePristo RGPU= GGDP4G001BFFBZ CREATE_INDEX=True
        Reindex the realigned BAM
        java -jar ~/bin/picard-tools-1.45/ReorderSam.jar I=new_out1FR_bwasw_new.sorted.realigned.bam O= new_out1FR_bwasw.resorted.realigned.bam REFERENCE= BRCA1_coding.fasta
        samtools index new_out1FR_bwasw.resorted.realigned.bam new_out1FR_bwasw.resorted.realigned.bam.bai
        Call SNPs
        java -jar ~/bin/GenomeAnalysisTK-1.0.5777/GenomeAnalysisTK.jar -T UnifiedGenotyper -R BRCA1_coding.fasta -I new_out1FR_bwasw.resorted.realigned.bam -o new_out1FR_bwasw.vcf.calls -stand_call_conf 30.0 -stand_emit_conf 10.0
        And I am getting only 1 SNP called of a very low quality and in the region of read depth 1.
        This region doesn't coincide with the intervals identified before.
        Also, when I compare the SNP called with my results from VarScan, there is no similarity.

        Can anyone please suggest how to improve SNP calling?
        Or is GATK not suitable for SNP calling in long read data from 454?

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        9 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        51 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        67 views
        0 likes
        Last Post seqadmin  
        Working...
        X