Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Question for GATK experts.....

    I called SNPs using GATK. I have a question.

    If i call SNPs using a single bam file and i get a set of SNPs (850 SNPs).

    Now i called SNPs from multiple bam alignments (not merged bam files, but listing them in consecutive order to get a single VCF file), i get more SNPs for the same sample (2598 SNPs). How is this possible? What am i going wrong? I am using the same filter conditions etc.

  • #2
    What is probably happening is that the 2600 minus 850 SNPs in your sample (call it sample #1) that are only called in the multi-sample SNP calling run are SNPs that didn't have enough evidence to be called as SNPs in sample #1 alone, but that did show evidence of being SNPs in other samples. Seeing the site as a SNP in other samples affects the probability that it is called as a SNP in sample #1.

    Intuitively, the situation is as follows: If we run SNP calling on sample #1 alone and see a site that has a modest amount of evidence that it is a SNP, it will probably not pass the filtering thresholds. If we run SNP calling on a bunch of samples and see that the same site has strong evidence of being a SNP in a different sample we can be more confident that the site is truly a SNP in sample #1. This is I believe one of the main advantages of multi-sample calling.

    Comment


    • #3
      Thanks d17. I understand as you say the depth of coverage increases when you use multi-sample for a location, thus increasing the number of SNPS in the resulting VCF file. Now the question is which one is correct (single sample or multi sample (I know both are correct, but what would one use for ASE)?

      Comment


      • #4
        Hi there,
        I am using GATK to call SNPs from my sam files (from 454 data).
        I am using he following pipeline::
        SAM to BAM
        samtools import BRCA1_coding.fasta out1FR_bwasw.sam new_out1FR_bwasw.bam

        Sort BAM
        samtools sort new_out1FR_bwasw.bam new_out1FR_bwasw.sorted

        Index BAM
        samtools index new_out1FR_bwasw.sorted.bam new_out1FR_bwasw.sorted.bam.bai

        Identify target regions for realignment
        java -jar ~/bin/GenomeAnalysisTK-1.0.5777/GenomeAnalysisTK.jar -T RealignerTargetCreator -R BRCA1_coding.fasta -I new_out1FR_bwasw.sorted.bam -o new_out1FR_bwasw.intervals
        And I get a an interval file that has 4 locations which is a subset of the regions i identified using tablet viewer.
        Following note from command line.
        14:53:16,100 TraversalEngine - 0 reads were filtered out during traversal out of 565 total (0.00%)
        Then,
        Realign BAM to get better Indel calling
        java -jar ~/bin/GenomeAnalysisTK-1.0.5777/GenomeAnalysisTK.jar -T IndelRealigner -R BRCA1_coding.fasta -I new_out1FR_bwasw.sorted.bam -targetIntervals new_out1FR_bwasw.intervals -o new_out1FR_bwasw.sorted.realigned.bam
        Add or Replace read group
        java -jar ~/bin/picard-tools-1.45/AddOrReplaceReadGroups.jar I= new_out1FR_bwasw.sorted.realigned.bam O= new_out1FR_bwasw_new.sorted.realigned.bam SORT_ORDER=coordinate RGID=foo RGLB=bar RGPL=illumina RGSM=DePristo RGPU= GGDP4G001BFFBZ CREATE_INDEX=True
        Reindex the realigned BAM
        java -jar ~/bin/picard-tools-1.45/ReorderSam.jar I=new_out1FR_bwasw_new.sorted.realigned.bam O= new_out1FR_bwasw.resorted.realigned.bam REFERENCE= BRCA1_coding.fasta
        samtools index new_out1FR_bwasw.resorted.realigned.bam new_out1FR_bwasw.resorted.realigned.bam.bai
        Call SNPs
        java -jar ~/bin/GenomeAnalysisTK-1.0.5777/GenomeAnalysisTK.jar -T UnifiedGenotyper -R BRCA1_coding.fasta -I new_out1FR_bwasw.resorted.realigned.bam -o new_out1FR_bwasw.vcf.calls -stand_call_conf 30.0 -stand_emit_conf 10.0
        And I am getting only 1 SNP called of a very low quality and in the region of read depth 1.
        This region doesn't coincide with the intervals identified before.
        Also, when I compare the SNP called with my results from VarScan, there is no similarity.

        Can anyone please suggest how to improve SNP calling?
        Or is GATK not suitable for SNP calling in long read data from 454?

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Advancing Precision Medicine for Rare Diseases in Children
          by seqadmin




          Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
          12-16-2024, 07:57 AM
        • seqadmin
          Recent Advances in Sequencing Technologies
          by seqadmin



          Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

          Long-Read Sequencing
          Long-read sequencing has seen remarkable advancements,...
          12-02-2024, 01:49 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 12-17-2024, 10:28 AM
        0 responses
        26 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 12-13-2024, 08:24 AM
        0 responses
        42 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 12-12-2024, 07:41 AM
        0 responses
        28 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 12-11-2024, 07:45 AM
        0 responses
        42 views
        0 likes
        Last Post seqadmin  
        Working...
        X