Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • getting meaningful allele out of vcf file

    I have generated a VCF file from a kinome capture experiment after all the filtering steps. I have a 9.5Mb vcf file from GATK and I don't know what to do with it. Basically I would like to generate a file for each chr that has columns such as position, number of reads, how many times allele is called etc. I don't have any idea what program to use. I don't really care about annotating at this point, just want to see allele calls in a readable text file. Please help.

  • #2
    You should check out bedtools.

    Comment


    • #3
      I was looking into bedtools but I don't know which command to use.

      Comment


      • #4
        The VCF file is a text file with that information in it (largely separated into columns in fact). You can just open it in your favorite text document view/editor (MS Word, Notepad, gvim, Preview.app). Frankly, the easiest way to split the calls by chromosome is probably just to use grep (assuming you're using Linux or a Mac). So "grep chrX file.vcf > chrX.vcf" for each chromosome. I'm sure someone can think of a short one line command with awk and sed to avoid having to grep for each chromosome, but frankly you probably have a small number of chromosomes so this won't be very labor intensive. Note that the result of the grep command won't really be a valid VCF file, since you'll miss much of the header, but that won't matter if you just want to go through it by hand.

        Comment


        • #5
          Originally posted by dpryan View Post
          The VCF file is a text file with that information in it (largely separated into columns in fact). You can just open it in your favorite text document view/editor (MS Word, Notepad, gvim, Preview.app). Frankly, the easiest way to split the calls by chromosome is probably just to use grep (assuming you're using Linux or a Mac). So "grep chrX file.vcf > chrX.vcf" for each chromosome. I'm sure someone can think of a short one line command with awk and sed to avoid having to grep for each chromosome, but frankly you probably have a small number of chromosomes so this won't be very labor intensive. Note that the result of the grep command won't really be a valid VCF file, since you'll miss much of the header, but that won't matter if you just want to go through it by hand.
          Thanks! I get that this is probably a stupid question, but if I type grep chr1, it pulls chr1, 11, 12, 13 etc. What am I doing wrong?

          Comment


          • #6
            Originally posted by shawpa View Post
            Thanks! I get that this is probably a stupid question, but if I type grep chr1, it pulls chr1, 11, 12, 13 etc. What am I doing wrong?
            Use grep -w chr1

            Comment


            • #7
              Originally posted by shawpa View Post
              Thanks! I get that this is probably a stupid question, but if I type grep chr1, it pulls chr1, 11, 12, 13 etc. What am I doing wrong?
              Nothing, I should have foreseen that! Mea culpa. There's a "word boundary" switch you can give to grep (at least on my computer). So try "grep -w chrX file.vcf > chrX.vcf". That should work better!

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Today, 08:47 AM
              0 responses
              11 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              60 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              59 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              54 views
              0 likes
              Last Post seqadmin  
              Working...
              X