Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • replacing specific positions in fasta from vcf/list

    I have a reference assembly (in fasta format) and vcf file containing a list of specific sites. I'd like to edit the fasta file to change these positions to 'Ns'.

    Does anyone have any suggestions for a tool to accomplish this? I also have a trimmed down version of the vcf that just contains chrom# and position...

    Thanks in advance for any suggestions!

  • #2
    Sounds like a job for python or perl.

    You could read through the vcf file and gather the positions in a dictionary. Then read through the fasta file and make the change to N at positions that match in the dictionary..

    Comment


    • #3
      thanks for the reply. yeah--seems to be the way to go, but i'm unfortunately not fluent enough in either language.

      i did find this example but couldn't get it to run properly (it output an entire new fasta for each individual position as it looped through the vcf instead of accumulating all the changes in the vcf before printing a single, mutated fasta). i suspect it's a trivial change to get it to work properly.

      at any rate, i managed to hack a solution by changing the 'alt' allele in my vcf to 'N', modifying (using sed) all the GT values to "1/1", then feeding this file into GATK's FastaAlternateReferenceMaker tool. clearly far from elegant, but i checked the positions in question in the output and it seemed to have worked.

      Comment


      • #4
        Clever solution!

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Today, 08:47 AM
        0 responses
        11 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        60 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        59 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        54 views
        0 likes
        Last Post seqadmin  
        Working...
        X