how to get the ID of the sequence matched from a fasta file

nmauri

Junior Member

Join Date: Nov 2015

Posts: 1
- Share
- Tweet
#1

how to get the ID of the sequence matched from a fasta file

11-20-2015, 04:16 AM

Hi Everyone!
I am a begginer in Bioinformatics and I need to solve the following problem:
I have a fasta file of protein that looks like this:
>id1
AASEQUENCE1
>id2
AASEQUENCE2
>id3
AASEQUENCE3

I have to match a sequence, for example, AIKA in each sequence and then return the corresponding ids of matches.
Thanks a lot
Tags: None
dariober

Senior Member

Join Date: May 2010

Posts: 311
- Share
- Tweet
#2

11-20-2015, 06:28 AM

This has been asked before in different flavours. If you are ok using shell/bash, try this:

Code:

awk -v RS=">" 'NR>1 {sub("\n", "\t"); gsub("\n", ""); print ">"$0}' test.fa \ | awk -v FS="\t" '$2 ~ /AIKA/ {sub("\t", "\n"); print $0}'

First awk linearizes the fasta file, second awk captures records containing AIKA and outputs back in fasta format. The original line wrapping is lost though.
Comment

Previous template Next

Topics	Statistics	Last Post
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 22 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 27 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 38 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 61 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM

Unconfigured Ad