Hi all,
I have a FASTA file which contains protein sequences of a load of genes from D. melanogaster, and need to split the file into multiple FASTAs, one gene per file. What's the best way to go about this? Ideally each file will be named with the name of the gene (UniProt ID).
Example, the sequence below wants to be split into 2 files, the first called O46197.fasta, the second called Q9VUQ5.fasta
Many thanks!
sp|O46197|A29AB_DROME Accessory gland protein Acp29AB OS=Drosophila melanogaster GN=Acp29AB PE=2 SV=2 MYASNLLYLLALWNLWDLSGGQQDIPNGKATLPSPQTPQNTIDQIGINQNYWFTYNALKQ NETLAIIDTMEMRIASSLLEFKAQMEIQLQPLKIIMRHHASNIKASNNIKMRRFEKVGSR HFHIEKNLMQTWFEAYVTCRKMNGHLANIQDEMELDGILALAPNNSYWIDISKLVENGGT FVSTLTGREPFFVKWKSNQDTKKKNQCVYIYAKEMSYDECFEKKSFVCQADQWA
.
sp|Q9VUQ5|AGO2_DROME Protein argonaute-2 OS=Drosophila melanogaster GN=AGO2 PE=1 SV=3 MGKKDKNKKGGQDSAAAPQPQQQQKQQQQRQQQPQQLQQPQQLQQPQQLQQPQQQQQQQP HQQQQQSSRQQPSTSSGGSRASGFQQGGQQQKSQDAEGWTAQKKQGKQQVQGWTKQGQQG GHQQGRQGQDGGYQQRPPGQQQGGHQQGRQGQEGGYQQRPPGQQQGGHQQGRQGQEGGYQ QRPSGQQQGGHQQGRQGQEGGYQQRPPGQQQGGHQQGRQGQEGGYQQRPSGQQQGGHQQG RQGQEGGYQQRPPGQQQGGHQQGRQGQEGGYQQRPPGQQQGGHEQGRQGQEGGYQQRPSG QQQGGHQQGRQGQEGGYQQRPSGQQQGGHQQGRQGQEGGYQQRPSGQQQGGHQQGRQGQE GGYQQRPPGQQPNQTQSQGQYQSRGPPQQQQAAPLPLPPQPAGSIKRGTIGKPGQVGINY LDLDLSKMPSVAYHYDVKIMPERPKKFYRQAFEQFRVDQLGGAVLAYDGKASCYSVDKLP LNSQNPEVTVTDRNGRTLRYTIEIKETGDSTIDLKSLTTYMNDRIFDKPMRAMQCVEVVL ASPCHNKAIRVGRSFFKMSDPNNRHELDDGYEALVGLYQAFMLGDRPFLNVDISHKSFPI SMPMIEYLERFSLKAKINNTTNLDYSRRFLEPFLRGINVVYTPPQSFQSAPRVYRVNGLS RAPASSETFEHDGKKVTIASYFHSRNYPLKFPQLHCLNVGSSIKSILLPIELCSIEEGQA LNRKDGATQVANMIKYAATSTNVRKRKIMNLLQYFQHNLDPTISRFGIRIANDFIVVSTR VLSPPQVEYHSKRFTMVKNGSWRMDGMKFLEPKPKAHKCAVLYCDPRSGRKMNYTQLNDF GNLIISQGKAVNISLDSDVTYRPFTDDERSLDTIFADLKRSQHDLAIVIIPQFRISYDTI KQKAELQHGILTQCIKQFTVERKCNNQTIGNILLKINSKLNGINHKIKDDPRLPMMKNTM YIGADVTHPSPDQREIPSVVGVAASHDPYGASYNMQYRLQRGALEEIEDMFSITLEHLRV YKEYRNAYPDHIIYYRDGVSDGQFPKIKNEELRCIKQACDKVGCKPKICCVIVVKRHHTR FFPSGDVTTSNKFNNVDPGTVVDRTIVHPNEMQFFMVSHQAIQGTAKPTRYNVIENTGNL DIDLLQQLTYNLCHMFPRCNRSVSYPAPAYLAHLVAARGRVYLTGTNRFLDLKKEYAKRT IVPEFMKKNPMYFV
I have a FASTA file which contains protein sequences of a load of genes from D. melanogaster, and need to split the file into multiple FASTAs, one gene per file. What's the best way to go about this? Ideally each file will be named with the name of the gene (UniProt ID).
Example, the sequence below wants to be split into 2 files, the first called O46197.fasta, the second called Q9VUQ5.fasta
Many thanks!
sp|O46197|A29AB_DROME Accessory gland protein Acp29AB OS=Drosophila melanogaster GN=Acp29AB PE=2 SV=2 MYASNLLYLLALWNLWDLSGGQQDIPNGKATLPSPQTPQNTIDQIGINQNYWFTYNALKQ NETLAIIDTMEMRIASSLLEFKAQMEIQLQPLKIIMRHHASNIKASNNIKMRRFEKVGSR HFHIEKNLMQTWFEAYVTCRKMNGHLANIQDEMELDGILALAPNNSYWIDISKLVENGGT FVSTLTGREPFFVKWKSNQDTKKKNQCVYIYAKEMSYDECFEKKSFVCQADQWA
.
sp|Q9VUQ5|AGO2_DROME Protein argonaute-2 OS=Drosophila melanogaster GN=AGO2 PE=1 SV=3 MGKKDKNKKGGQDSAAAPQPQQQQKQQQQRQQQPQQLQQPQQLQQPQQLQQPQQQQQQQP HQQQQQSSRQQPSTSSGGSRASGFQQGGQQQKSQDAEGWTAQKKQGKQQVQGWTKQGQQG GHQQGRQGQDGGYQQRPPGQQQGGHQQGRQGQEGGYQQRPPGQQQGGHQQGRQGQEGGYQ QRPSGQQQGGHQQGRQGQEGGYQQRPPGQQQGGHQQGRQGQEGGYQQRPSGQQQGGHQQG RQGQEGGYQQRPPGQQQGGHQQGRQGQEGGYQQRPPGQQQGGHEQGRQGQEGGYQQRPSG QQQGGHQQGRQGQEGGYQQRPSGQQQGGHQQGRQGQEGGYQQRPSGQQQGGHQQGRQGQE GGYQQRPPGQQPNQTQSQGQYQSRGPPQQQQAAPLPLPPQPAGSIKRGTIGKPGQVGINY LDLDLSKMPSVAYHYDVKIMPERPKKFYRQAFEQFRVDQLGGAVLAYDGKASCYSVDKLP LNSQNPEVTVTDRNGRTLRYTIEIKETGDSTIDLKSLTTYMNDRIFDKPMRAMQCVEVVL ASPCHNKAIRVGRSFFKMSDPNNRHELDDGYEALVGLYQAFMLGDRPFLNVDISHKSFPI SMPMIEYLERFSLKAKINNTTNLDYSRRFLEPFLRGINVVYTPPQSFQSAPRVYRVNGLS RAPASSETFEHDGKKVTIASYFHSRNYPLKFPQLHCLNVGSSIKSILLPIELCSIEEGQA LNRKDGATQVANMIKYAATSTNVRKRKIMNLLQYFQHNLDPTISRFGIRIANDFIVVSTR VLSPPQVEYHSKRFTMVKNGSWRMDGMKFLEPKPKAHKCAVLYCDPRSGRKMNYTQLNDF GNLIISQGKAVNISLDSDVTYRPFTDDERSLDTIFADLKRSQHDLAIVIIPQFRISYDTI KQKAELQHGILTQCIKQFTVERKCNNQTIGNILLKINSKLNGINHKIKDDPRLPMMKNTM YIGADVTHPSPDQREIPSVVGVAASHDPYGASYNMQYRLQRGALEEIEDMFSITLEHLRV YKEYRNAYPDHIIYYRDGVSDGQFPKIKNEELRCIKQACDKVGCKPKICCVIVVKRHHTR FFPSGDVTTSNKFNNVDPGTVVDRTIVHPNEMQFFMVSHQAIQGTAKPTRYNVIENTGNL DIDLLQQLTYNLCHMFPRCNRSVSYPAPAYLAHLVAARGRVYLTGTNRFLDLKKEYAKRT IVPEFMKKNPMYFV
Comment