Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Rhewter
    Member
    • Sep 2014
    • 10

    SNPs that are at least "N" bp away from each other

    Hello,

    I got a set of SNPs in a vcf file and would like to discard SNPs many nearby. The idea is to take only SNPs that are at least 200 bp away from each other. How can I do this?

    Thanks in advance.
  • dpryan
    Devon Ryan
    • Jul 2011
    • 3478

    #2
    You could do this with awk, assuming the VCF file is sorted in a useful way. The general idea is to keep track of two variables: the chromosome of the last entry and the position of the last entry. If the chromosome of an entry and the previous are different or their distances are above 200bp then print $0 (the variables should be updated in any case). Have a go at writing such a script in awk (or python, or perl, or whatever else you know) and post a question when you run into problems.

    Comment

    • Rhewter
      Member
      • Sep 2014
      • 10

      #3
      Originally posted by dpryan View Post
      You could do this with awk, assuming the VCF file is sorted in a useful way. The general idea is to keep track of two variables: the chromosome of the last entry and the position of the last entry. If the chromosome of an entry and the previous are different or their distances are above 200bp then print $0 (the variables should be updated in any case). Have a go at writing such a script in awk (or python, or perl, or whatever else you know) and post a question when you run into problems.
      Hi...

      Thank you for your answer. I thought that possibility before but would like to know whether there was a ready-made option. But I think your suggestion is the best option.

      Thank you!

      Comment

      • SNPsaurus
        Registered Vendor
        • May 2013
        • 525

        #4
        vcftools has a --thin option

        --thin <int>

        Thin sites so that no two sites are within the specified distance from one another.
        Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

        Comment

        • Rhewter
          Member
          • Sep 2014
          • 10

          #5
          Originally posted by SNPsaurus View Post
          vcftools has a --thin option

          --thin <int>

          Thin sites so that no two sites are within the specified distance from one another.

          Hi SNPsaurus,

          I tried using this function but it only generates a log file (with the number of unique variants that range) and not a new VCF. Is that so?


          Thank you for your answer!

          Comment

          • SNPsaurus
            Registered Vendor
            • May 2013
            • 525

            #6
            I think you need to add the --recode option, which tells vcftools to actually make a new vcf based on the filtering.
            Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

            Comment

            • Rhewter
              Member
              • Sep 2014
              • 10

              #7
              Originally posted by SNPsaurus View Post
              I think you need to add the --recode option, which tells vcftools to actually make a new vcf based on the filtering.

              It really is that. Thank you very much once again.

              Comment

              • drpbt1692@gmail.com
                Junior Member
                • May 2018
                • 1

                #8
                Hi SNPsaurus,
                I have applied various filters from VCFfilter tool i.e. DP>10, AF>0.1 etc.
                This filter information is shown in top of the VCF file also. However, when I used to work with VCFtools for minimum SNPs distance (--Thin 10000) parametere, it does not show the information in vcf file.
                Can you make me understand regarding this?

                Regards
                Prakash Thakor

                Comment

                Latest Articles

                Collapse

                • GATTACAT
                  Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                  by GATTACAT
                  Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                  07-01-2026, 11:43 AM
                • SEQadmin2
                  Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                  by SEQadmin2


                  I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                  Here are nine questions we think about, in roughly the order they matter, before...
                  06-18-2026, 07:11 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by SEQadmin2, Yesterday, 11:08 AM
                0 responses
                7 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-30-2026, 05:37 AM
                0 responses
                11 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-26-2026, 11:10 AM
                0 responses
                19 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-17-2026, 06:09 AM
                0 responses
                53 views
                0 reactions
                Last Post SEQadmin2  
                Working...