Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • intersect VCF files

    Hi:

    I like bedtools. but now GATK pipelines are spewing VCF files and I want to intersect 3 files.

    For instance I have 3 VCF files F1, F2 and F3.

    I want to intersect
    newfile = intersect(F1, F2)

    then I want to :

    newsecondfile = intersect(newfile, F3)

    How can I do this on VCF files. I tried vcftools, it is not handy with all the gz files and perl.

    I like something a BEDtools type.

    Any suggestions.

    thanks
    Adrian

  • #2
    IntersectBed claims to accept .vcf files. I'm pretty sure I've used it myself to do just that.

    What I've also done is used mpileup in samtools to take in multiple .bams files together. The downside is that it doesn't keep all the information for each sample together, but it at least give you GT, PL and GQ values for each sample. You can filter by the GT or the PL to find SNPs that are or aren't in whatever combination of samples you want.

    Comment


    • #3
      There is also vcfutils vcf-isec:

      Comment


      • #4
        Has anyone got vcf-isec to work?

        I bgzipped my vcfs and tabix'ed them..

        here I try it on the same vcf:

        vcf-isec -c 26530.snv.vcf.gz 26530.snv.vcf.gz

        but I get:
        Can't use string ("silent") as a HASH ref while "strict refs" in use at /net/home/leparc/bin/VCFtools/perl/Vcf.pm line 542.

        Also, why all the trouble with bgzipping and tabix indexing... it's a lot of hassle just to do something so simple.

        Comment


        • #5
          I've successfully run vcf-isec to compare two related individuals:

          vcf-isec -n +2 -f file1.vcf.gz file2.vcf.gz > file3.vcf.gz

          Comment


          • #6
            Have you tried vcftools?

            Comment


            • #7
              There is also vcfintersect: https://github.com/ekg/vcflib#vcfintersect

              It works with both BED files and VCF files, and can generate inverse intersections (allowing you to find things that are not in one file).

              Comment


              • #8
                Hello,

                Would anyone happen to know how to merge a set a vcf files where you have at least 20% to at most 90% of all candidates reported across all files into one new file?

                Thanks,
                Nino
                Last edited by Nino; 02-20-2014, 11:24 AM. Reason: forgot a word

                Comment


                • #9
                  I can also suggest R. Convert your VCF to tab files, and then intersect the positions where variants are called.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM
                  • seqadmin
                    Techniques and Challenges in Conservation Genomics
                    by seqadmin



                    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                    Avian Conservation
                    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                    03-08-2024, 10:41 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 06:37 PM
                  0 responses
                  10 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, Yesterday, 06:07 PM
                  0 responses
                  10 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-22-2024, 10:03 AM
                  0 responses
                  51 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-21-2024, 07:32 AM
                  0 responses
                  67 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X