Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • exome/vcf merge question

    I research rare mendelian diseases and generally look for shared variants between my related affected samples. Because of variation in the quality and coverage of exomes, I want to be able to look at a merged variant file for all of my affected cases.

    This is relatively straight forward with vcf tools however if for example I am looking at the shared variants of two individuals, some variants may not be shared because of three reasons: 1. one individual has variant allele and one has wild-type i.e. not shared. 2) variant not covered on second exome 3) variant covered on second exome but at allele frequency below cutoff for variant call.

    Can anyone help with a method to annotate the merged variant list with the depth and allele call for each of the samples including calling the wild-type allele where true.

    p.s I had thought of using bedtools to annotate the depth of read at each start position but this wouldnt help me annotate with wildtype calls.

    Thanks

    Josh

  • #2
    I have a similar problem. I get my sequence data in batches (not all samples at once) and would like to have a running list of variants called on samples thus far.

    As you said, the biggest issue with straightforward merging of VCFs is that we need to differentiate between
    • evidence of absence ("there is sufficient depth at this locus and this sample is reference homozygous") and
    • absence of evidence ("this sample does not have enough coverage to infer whether there is a variant at this locus").

    I am still searching for solutions and will post if I find one.
    Kamalakar Gulukota,
    Director,
    Center for Bioinformatics and Computational Biology
    NorthShore University Health System, [email protected]

    Comment


    • #3
      Re: Create a VCF with your first bam file, say 1.vcf

      OK. There is a 3-step procedure that can accomplish what you want (I think).

      Step 1. Create VCF's with your first and second bam files separately, say old.vcf and new.vcf.

      Step 2. Next create a combined vcf with the two. I used the CombineVariants walker in GATK like so:
      PHP Code:
      java -jar GenomeAnalysisTK.jar -T CombineVariants -R GRCh37.fa --variant old.vcf --variant new.vcf -o joined.vcf -genotypeMergeOptions  UNIQUIFY 
      But presumably you can do similar with bedtools.

      Step 3. Finally, run the GATK UnifiedGenotyper by using the joined vcf as the target file i.e. with the -L option, like so:

      PHP Code:
      java -jar GenomeAnalysisTK.jar  -T UnifiedGenotyper -R GRCh37.fa -L joined.vcf -I old.bam -new.bam -final.vcf 
      I have combined 30 old bams with 50 new bams using this method and seems to work well.

      However, allow me to hasten to add that the best practice would be to run variant calling on all samples together. The above procedure might be quick and dirty. I think it will be mostly accurate but there will be differences between this procedure and redoing the whole shebang.
      Kamalakar Gulukota,
      Director,
      Center for Bioinformatics and Computational Biology
      NorthShore University Health System, [email protected]

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM
      • seqadmin
        Techniques and Challenges in Conservation Genomics
        by seqadmin



        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

        Avian Conservation
        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
        03-08-2024, 10:41 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 03-27-2024, 06:37 PM
      0 responses
      12 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-27-2024, 06:07 PM
      0 responses
      11 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-22-2024, 10:03 AM
      0 responses
      53 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-21-2024, 07:32 AM
      0 responses
      69 views
      0 likes
      Last Post seqadmin  
      Working...
      X