View Single Post
Old 02-23-2017, 06:41 AM   #1
SDPA_Pet
Senior Member
 
Location: US

Join Date: Apr 2013
Posts: 222
Default How can I do this kind of filtering

Hi I have a fasta file, sequence like this. Basically, it is an annotated files the sequences name include fuction name, and organism.

I want to do this kind of filtering.

1> extract the sequence name ("mgm4510423.3|contig02227|RefSeq|73954f841ecd7c512c5428ed1b1a747e accession=[NP_559375.1],function=[carbamate kinase],organism=[Pyrobaculum aerophilum str. IM2]")to a text file. I would be better to separate by comma. That is, make three columns. ID, function and organism.

2> After I create the upper text file. I can choose the organism that I want to keep. Filter the fasta files, so I will get all the sequences that I need for particular organisms.

Any software or Unix command like grep /awk can do this.

Code:
>mgm4510423.3|contig02227|RefSeq|73954f841ecd7c512c5428ed1b1a747e accession=[NP_559375.1],function=[carbamate kinase],organism=[Pyrobaculum aerophilum str. IM2]
AAGAAACGTCGACGTAGCCGCCAGAGTCGTGGCAGGGgTAATGCAGGGAGGCCACCAGGTGGTGGTGACGCACGGCAACGGGCCCCAGGTGGGCTACCTGGCGgAGTTGCaGAgaGACAACGGCACATTTCGGCTGGACGCCCTAAACGCCATGACGCaGGGgATGCTCGGCTACTTCCTTGTCTCTGCGCTTGATAAATACTTAGGCAGGGGGAGGGCCGCGGCTTTGGTGACCAGAGTCGAGGTGGACTGCGACGACCCGGCTTTTaaagaCCCGACcAAGTTCATAGGTCCCCTATACGGCAAGGAaCaGgCTGAGGCCCTCGCACAGAGGTACGGGTGGCAGTTTAGGCAAGACCCAAGAGGAGGCTGGCgtCGCGTCGTCGCGTCGCCTACGCCGCTCAGAAtcGTGGAGATAGAGGCCGTAAAGaGGTTGCTGgACGCGgGTTTCGTCGTTGTGGCGgCGGGCGGCGGCGGTaTACCGCTCTGCGGAGACAGAgaCGTAGAGGGGGTTATAGACAAGGACTTGGCCTCTTCTCTCCTCGCTGTGGAGCTCGGCGCGGACTTCTTCATGATGCTGACCGACATAGACGCCGTCTACCTAAACTACGGGAaGCCGAACCAGAGGAGGCTAGACAGCGTAGGGGCTGACGAGCTGGAGAGGTATTTCGCCGAtGGCcACTTCCCGCCGGGCTCCATGGGGCCGAAGGTGCAGGCCGCGATAAACTTCGTGAAacAAAcGGggaGAaGGGCGGCCATCGGGGCGCTGGAGGAGGGCTAtGACGtGTTCAGGGGAATAAAGGGGACCCAGGTgACGCCTTAGAGCTCGTTTATTGGCTTTTCGTATTCCTCCCTcTtCtGGAGGTCTCGgATCTTgACTACGCCGCGCTCCAGCTCTTTCTTGCCGATTATGATTAGGtACCGCGTGCCTATCTTCAAGATGTATTCAAAGGCCTcTTTtAGGCTTTTCTCGCCCAGCTCCACAGCCACGCTGAAGCCTGCGCTCCTCAGCTTCTtcGCAACTGCCACGGCCTGCGGGTACGCCTCgTCGTCGAAGATGTAGATGTAGTAGTCCAGCGGCTTCTCCACGTTGTGGAGCCCcACGgCCTCcATAAACcTCTCAACGCCGATGGCGAaCCCCAGCGCCGgCGtCtttACGCCGCTGTAGAGCT

Last edited by Brian Bushnell; 02-23-2017 at 08:45 AM.
SDPA_Pet is offline   Reply With Quote