Hi all,
I frequent these forums often but this is my first post.
I've got a problem that I don't have the scripting skills to solve (nor the time to gain them at the moment).
What I want to do is combine two multi fasta files in a specific order based on the sequence IDs.
For example;
file 1
>seq1
TTTGGATTACAAAGTTATTTAAATCACATGT....
>seq2
GCCGTGCCATTTCAATTACAAATACATAATA....
file 2
>seq1_probe1
CTTTGTCCTTGTCCTTGGTGGCGG....
>seq1_probe2
ATTTCTTCTCATCCTCCTCCTCCTA....
>seq2_probe1
ACTAAAAACTCGTTGAAGAAATCC....
>seq2_probe2
AGGATATAACACACAGCCATCACC....
In need to combined file to look like;
>seq1
TTTGGATTACAAAGTTATTTAAATCACATGT....
>seq1_probe1
CTTTGTCCTTGTCCTTGGTGGCGG....
>seq1
TTTGGATTACAAAGTTATTTAAATCACATGT....
>seq1_probe2
ATTTCTTCTCATCCTCCTCCTCCTA....
>seq2
GCCGTGCCATTTCAATTACAAATACATAATA....
>seq2_probe1
ACTAAAAACTCGTTGAAGAAATCC....
>seq2
GCCGTGCCATTTCAATTACAAATACATAATA....
>seq2_probe2
AGGATATAACACACAGCCATCACC....
Note that only part of file 2's sequence IDs are common to file 1's.
I'd prefer to use perl as that is the language I'm learning but any solution will suffice.
Thanks for reading.
I frequent these forums often but this is my first post.
I've got a problem that I don't have the scripting skills to solve (nor the time to gain them at the moment).
What I want to do is combine two multi fasta files in a specific order based on the sequence IDs.
For example;
file 1
>seq1
TTTGGATTACAAAGTTATTTAAATCACATGT....
>seq2
GCCGTGCCATTTCAATTACAAATACATAATA....
file 2
>seq1_probe1
CTTTGTCCTTGTCCTTGGTGGCGG....
>seq1_probe2
ATTTCTTCTCATCCTCCTCCTCCTA....
>seq2_probe1
ACTAAAAACTCGTTGAAGAAATCC....
>seq2_probe2
AGGATATAACACACAGCCATCACC....
In need to combined file to look like;
>seq1
TTTGGATTACAAAGTTATTTAAATCACATGT....
>seq1_probe1
CTTTGTCCTTGTCCTTGGTGGCGG....
>seq1
TTTGGATTACAAAGTTATTTAAATCACATGT....
>seq1_probe2
ATTTCTTCTCATCCTCCTCCTCCTA....
>seq2
GCCGTGCCATTTCAATTACAAATACATAATA....
>seq2_probe1
ACTAAAAACTCGTTGAAGAAATCC....
>seq2
GCCGTGCCATTTCAATTACAAATACATAATA....
>seq2_probe2
AGGATATAACACACAGCCATCACC....
Note that only part of file 2's sequence IDs are common to file 1's.
I'd prefer to use perl as that is the language I'm learning but any solution will suffice.
Thanks for reading.
Comment