Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • nmauri
    Junior Member
    • Nov 2015
    • 1

    how to get the ID of the sequence matched from a fasta file

    Hi Everyone!
    I am a begginer in Bioinformatics and I need to solve the following problem:
    I have a fasta file of protein that looks like this:
    >id1
    AASEQUENCE1
    >id2
    AASEQUENCE2
    >id3
    AASEQUENCE3

    I have to match a sequence, for example, AIKA in each sequence and then return the corresponding ids of matches.
    Thanks a lot
  • dariober
    Senior Member
    • May 2010
    • 311

    #2
    This has been asked before in different flavours. If you are ok using shell/bash, try this:

    Code:
    awk -v RS=">" 'NR>1 {sub("\n", "\t"); gsub("\n", ""); print ">"$0}' test.fa \
    | awk -v FS="\t" '$2 ~ /AIKA/ {sub("\t", "\n"); print $0}'
    First awk linearizes the fasta file, second awk captures records containing AIKA and outputs back in fasta format. The original line wrapping is lost though.

    Comment

    Latest Articles

    Collapse

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by SEQadmin2, 06-09-2026, 11:58 AM
    0 responses
    22 views
    0 reactions
    Last Post SEQadmin2  
    Started by SEQadmin2, 06-05-2026, 10:09 AM
    0 responses
    27 views
    0 reactions
    Last Post SEQadmin2  
    Started by SEQadmin2, 06-04-2026, 08:59 AM
    0 responses
    38 views
    0 reactions
    Last Post SEQadmin2  
    Started by SEQadmin2, 06-02-2026, 12:03 PM
    0 responses
    61 views
    0 reactions
    Last Post SEQadmin2  
    Working...