I've two sets of large number of proteins( in the order 100K) , and wish to find out unique proteins belonging to each set.
Is there any tool for doing it fast?
Thanks
Is there any tool for doing it fast?
Thanks
You are currently viewing the SEQanswers forums as a guest, which limits your access. Click here to register now, and join the discussion
cat file1 file2 | sort | uniq -u
#!/usr/bin/perl use warnings; use strict; open(my $f1, "< file1.fasta") or die("Cannot open file1"); open(my $f2, "< file2.fasta") or die("Cannot open file2"); my %seenSequences = (); my $sequence = ""; my $seqID = ""; while(<$f1>){ chomp; if(/^>(.*)$/){ $seenSequences{$sequence} = $seqID if $seqID ne ""; $sequence = ""; $seqID = $1; } else { $sequence .= $_; } } $seenSequences{$sequence} = $seqID if $seqID ne ""; close($f1); $sequence = ""; $seqID = ""; while(<$f2>){ chomp; if(/^>(.*)$/){ if(($seqID ne "") && !exists($seenSequences{$sequence})){ printf(">%s [2]\n%s\n", $seqID, $sequence); } else { delete($seenSequences{$sequence}); } $seqID = $1; $sequence = ""; } else { $sequence .= $_; } } if(($seqID ne "") && !exists($seenSequences{$sequence})){ printf(">%s [2]\n%s\n", $seqID, $sequence); } else { delete($seenSequences{$sequence}); } close($f1); while(my ($seq, $id) = each(%seenSequences)){ printf(">%s [1]\n%s\n", $id, $seq); }
./141964.pl > out.fasta && head *.fasta ==> file1.fasta <== >1 PRTEINEIN >3 PRTEINTHREE ==> file2.fasta <== >1 PRTEINEIN >2 PRTEINNI ==> out.fasta <== >2 [2] PRTEINNI >3 [1] PRTEINTHREE
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 08:47 AM
|
0 responses
10 views
0 likes
|
Last Post
by seqadmin
Today, 08:47 AM
|
||
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
60 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
57 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
||
Started by seqadmin, 04-10-2024, 09:21 AM
|
0 responses
53 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 09:21 AM
|
Comment