Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • variant filtering

    Hi All,

    I have cases and controls the were exomed sequenced and used GATK to call the variants in all the cases and controls combined. This generated a single vcf file with all the variants. I would like then to keep the variants that are in cases and not in my controls after removing the low quality variants (with scores < 30). I would like to get some ideas on the best way to handle this? The goal is to remove the variant s in the controls and see what is left.

    thanks

  • #2
    Hi,

    VCF-tools has some useful tools for this sort of thing, see http://vcftools.sourceforge.net/perl...l#vcf-contrast. Looks like exactly what you're looking for.

    Hope that helps.

    Comment


    • #3
      I have tried the vcf tool vcf-contrast from the following link http://vcftools.sourceforge.net/perl...l#vcf-contrast
      but i am getting the following warning

      Argument "." isn't numeric in numeric gt (>) at vcftools_0.1.10/perl/vcf-contrast line 144, <STDIN> line 120414.


      Any idea?

      thanks

      Comment


      • #4
        Originally posted by kjaja View Post
        I have tried the vcf tool vcf-contrast from the following link http://vcftools.sourceforge.net/perl...l#vcf-contrast
        but i am getting the following warning

        Argument "." isn't numeric in numeric gt (>) at vcftools_0.1.10/perl/vcf-contrast line 144, <STDIN> line 120414.


        Any idea?

        thanks
        When I get that sort of message it's generally a trailing new-line on your input file. You should still have an output? Do a line count on your vcf file, I would suggest it's likely to be 120414 lines long. If this does pose an issue to getting an output file, remove it in vim.

        Comment


        • #5
          Was wondering if this can be done using GATK?

          Comment


          • #6
            try kggseq

            java -Xms256m -Xmx1300m --buildver hg19 -jar kggseq.jar --no-resource-check --buildver hg19 --vcf-file yourfile.vcf --ped-file pedigree.ped --o-vcf --genotype-filter 2,4

            in file pedigree.ped put there the status of case(2) or control (1)
            for genotype filter option 2 (homozygous variables present in cases and controls) and 4 (heterozygous variables present in cases and controls)

            Comment


            • #7
              I have tried using vcf-contrast in vcf tools using the following command

              vcftools_0.1.10/perl/vcf-contrast +sample1,sample2 -sample3 -n allAllsamples.vcf > insample1or2NOTsample3.vcf

              where I am looking for variants that could be in sample 1 OR sample 2 but not in sample 3. But I found that some of the variants in sample 1 are in sample 3. And the same issue with sample 2.
              Any suggestions will be greatly appreciated

              Comment


              • #8
                I also noticed that vcf-contrast does not return the expected results. One still gets variants that are present in samples given with the minus flag. Does somebody know another program which has the same functionality? I think that GATK SelectVariants cannot be used for this purpose.
                Thank you.

                Comment


                • #9
                  You could also try the following:
                  upload your multiple vcf file to GeneTalk (www.gene-talk.de). Use the collection tool to asign all the cases the status "affected" and all the controls the status "unaffected". Then proceed with inheritance filtering option "dominant". This will yield variants that are unique to the cases.

                  Comment


                  • #10
                    Hi Krawitz,
                    thanks for your reply. I also had cosidered this already, I was just hoping for a more direct solution.

                    Comment


                    • #11
                      Since no one has mentioned SnpSIFT, I should way that it works very well for this

                      Comment


                      • #12
                        Could you print like 5-10 lines from your "combined" vcf file here? Try to include an example of what output you would like to see.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Current Approaches to Protein Sequencing
                          by seqadmin


                          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                          04-04-2024, 04:25 PM
                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 04-11-2024, 12:08 PM
                        0 responses
                        24 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 10:19 PM
                        0 responses
                        25 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 09:21 AM
                        0 responses
                        21 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-04-2024, 09:00 AM
                        0 responses
                        52 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X