Hi all,
I am kinda new in bioinformatics. I have a question regarding sort of data, hope you all can guide me.
I have a single fasta file as follow:
>final_100_00001 hypothetical protein
GTGTATAAGAGACAGGGCGTTGTCCAGCGCGATCTTCTGGCCGTCGCGATCGAGGGCCTGCCCGAGCGCCGACAGCTTGAGCGACACCTCGAGCGGTCGCACCCCGTCGCATGCGATATCGCCGCGGCGGC
.
.
.
.
.
>final_106_02271 hypothetical protein
ATGCGCACTACGATCGACCTCGATGACGACATACTGCGGGCGTTGAAACGACGCCAGCGCGAGGAGCGCAAAACGTTAGGGCAGCTCGCCTCCGAATTGCTTGCGCAAGCTCTGG
>final_106_02272 putative ribonuclease VapC43
GTGAGCGAAACCTTTGACGTCGATGTTCTGGTCCATGCGACGCACCGAGCCAGCCCGTTTCACGATAAGGCGAAGACGCTCGTTGAGCGATTCCTGGCTGGGCCAGGGCTGGTATATCTA
>final_106_00001 Glycerol kinase
GTGTCCGACGCCATCCTAGGAGAGCAATTGGCCGAGTCCTCGGATTTCATAGCCGCCATCGACCAGGGCACCACCAGCACCCGCTGCATGATCTTCGATCACCACGGTGCCGAGG
>final_107_00002 RDD family protein
ATGTCGGAGGTGGTGACCGGCGACGCCGTGGTGCTCGACGTACAGATCGCCCAGTTGCCGGTGCGCGCGGTCAGCGCGGTCATCGATATCACCATAATATTCATCGG
>final_107_00003 hypothetical protein
ATGCTGCCGCCTGGCTACCCGGTTGAACCACCGCCCGTGGCGCCGGGATATGCGCCGGCCGGATATCCGCCCTACCCCGCTACACCACCCGGGTACGGCCCGCCGGGTTA
.
.
.
.
.
>final_120_02354 hypothetical protein
ATGACGATGGCTCGGGTGCGTCGCGGCACGGAACTGTTGTTGTCACCTCAGTCGCCGCCGGCCACCGGCGGGCTGATCGTGTTGACCGGTCTGCGGCTGTTGGCTGGGTTGATCTG
>final_120_02355 Soluble secreted antigen MPT53 precursor
GTGACTCATTCCCGTCTGATTGGCGCACTTACCGTAGTCGCAATTATCGTCACTGCATGTGGTTCGCAGCCGAAATCCCAGCCCGCAGTGGCACCTACCGGGGACGCGGCCGCT
>final_120_02356 AhpC/TSA family protein
GTGCCGACTCGCCGTCTGCAGGACATCAACGATCAACCGGTGGACGTCCCGGCTGCGACCGGAAGGACACACCTGCAGTTTCGGCGGTTCGCGGCCTGTCCGATCTGCCA
>final_120_02357 transcriptional activator FtrB
ATGGCAGATCGGTCGGTGCGCCCGCTGCGGCATCTCGTTCATGCGGTGACTGGGGGCCAACCGCCTTCCGAGGCCCAGGTCCGACAGGCAGCCTGGATTGCGCGGTGCGTCGG
What should I do in order to sort them according to gene name such as the following?
>final_120_R1_02356 AhpC/TSA family protein
GTGCCGACTCGCCGTCTGCAGGACATCAACGATCAACCGGTGGACGTCCCGGCTGCGACCGGAAGGACACACCTGCAGTTTCGGCGGTTCGCGGCCTGTCCGATCTGCCA
>final_106_00001 Glycerol kinase
GTGTCCGACGCCATCCTAGGAGAGCAATTGGCCGAGTCCTCGGATTTCATAGCCGCCATCGACCAGGGCACCACCAGCACCCGCTGCATGATCTTCGATCACCACGGTGCCGAGG
>final_100_00001 hypothetical protein
GTGTATAAGAGACAGGGCGTTGTCCAGCGCGATCTTCTGGCCGTCGCGATCGAGGGCCTGCCCGAGCGCCGACAGCTTGAGCGACACCTCGAGCGGTCGCACCCCGTCGCATGCGATATCGCCGCGGCGGC
>final_106_02271 hypothetical protein
ATGCGCACTACGATCGACCTCGATGACGACATACTGCGGGCGTTGAAACGACGCCAGCGCGAGGAGCGCAAAACGTTAGGGCAGCTCGCCTCCGAATTGCTTGCGCAAGCTCTGG
>final_107_00003 hypothetical protein
ATGCTGCCGCCTGGCTACCCGGTTGAACCACCGCCCGTGGCGCCGGGATATGCGCCGGCCGGATATCCGCCCTACCCCGCTACACCACCCGGGTACGGCCCGCCGGGTTA
>final_120_R1_02354 hypothetical protein
ATGACGATGGCTCGGGTGCGTCGCGGCACGGAACTGTTGTTGTCACCTCAGTCGCCGCCGGCCACCGGCGGGCTGATCGTGTTGACCGGTCTGCGGCTGTTGGCTGGGTTGATCTG
>final_106_02272 putative ribonuclease VapC43
GTGAGCGAAACCTTTGACGTCGATGTTCTGGTCCATGCGACGCACCGAGCCAGCCCGTTTCACGATAAGGCGAAGACGCTCGTTGAGCGATTCCTGGCTGGGCCAGGGCTGGTATATCTA
>final_107_00002 RDD family protein
ATGTCGGAGGTGGTGACCGGCGACGCCGTGGTGCTCGACGTACAGATCGCCCAGTTGCCGGTGCGCGCGGTCAGCGCGGTCATCGATATCACCATAATATTCATCGG
>final_120_02355 Soluble secreted antigen MPT53 precursor
GTGACTCATTCCCGTCTGATTGGCGCACTTACCGTAGTCGCAATTATCGTCACTGCATGTGGTTCGCAGCCGAAATCCCAGCCCGCAGTGGCACCTACCGGGGACGCGGCCGCT
>final_120_02357 transcriptional activator FtrB
ATGGCAGATCGGTCGGTGCGCCCGCTGCGGCATCTCGTTCATGCGGTGACTGGGGGCCAACCGCCTTCCGAGGCCCAGGTCCGACAGGCAGCCTGGATTGCGCGGTGCGTCGG
I am kinda new in bioinformatics. I have a question regarding sort of data, hope you all can guide me.
I have a single fasta file as follow:
>final_100_00001 hypothetical protein
GTGTATAAGAGACAGGGCGTTGTCCAGCGCGATCTTCTGGCCGTCGCGATCGAGGGCCTGCCCGAGCGCCGACAGCTTGAGCGACACCTCGAGCGGTCGCACCCCGTCGCATGCGATATCGCCGCGGCGGC
.
.
.
.
.
>final_106_02271 hypothetical protein
ATGCGCACTACGATCGACCTCGATGACGACATACTGCGGGCGTTGAAACGACGCCAGCGCGAGGAGCGCAAAACGTTAGGGCAGCTCGCCTCCGAATTGCTTGCGCAAGCTCTGG
>final_106_02272 putative ribonuclease VapC43
GTGAGCGAAACCTTTGACGTCGATGTTCTGGTCCATGCGACGCACCGAGCCAGCCCGTTTCACGATAAGGCGAAGACGCTCGTTGAGCGATTCCTGGCTGGGCCAGGGCTGGTATATCTA
>final_106_00001 Glycerol kinase
GTGTCCGACGCCATCCTAGGAGAGCAATTGGCCGAGTCCTCGGATTTCATAGCCGCCATCGACCAGGGCACCACCAGCACCCGCTGCATGATCTTCGATCACCACGGTGCCGAGG
>final_107_00002 RDD family protein
ATGTCGGAGGTGGTGACCGGCGACGCCGTGGTGCTCGACGTACAGATCGCCCAGTTGCCGGTGCGCGCGGTCAGCGCGGTCATCGATATCACCATAATATTCATCGG
>final_107_00003 hypothetical protein
ATGCTGCCGCCTGGCTACCCGGTTGAACCACCGCCCGTGGCGCCGGGATATGCGCCGGCCGGATATCCGCCCTACCCCGCTACACCACCCGGGTACGGCCCGCCGGGTTA
.
.
.
.
.
>final_120_02354 hypothetical protein
ATGACGATGGCTCGGGTGCGTCGCGGCACGGAACTGTTGTTGTCACCTCAGTCGCCGCCGGCCACCGGCGGGCTGATCGTGTTGACCGGTCTGCGGCTGTTGGCTGGGTTGATCTG
>final_120_02355 Soluble secreted antigen MPT53 precursor
GTGACTCATTCCCGTCTGATTGGCGCACTTACCGTAGTCGCAATTATCGTCACTGCATGTGGTTCGCAGCCGAAATCCCAGCCCGCAGTGGCACCTACCGGGGACGCGGCCGCT
>final_120_02356 AhpC/TSA family protein
GTGCCGACTCGCCGTCTGCAGGACATCAACGATCAACCGGTGGACGTCCCGGCTGCGACCGGAAGGACACACCTGCAGTTTCGGCGGTTCGCGGCCTGTCCGATCTGCCA
>final_120_02357 transcriptional activator FtrB
ATGGCAGATCGGTCGGTGCGCCCGCTGCGGCATCTCGTTCATGCGGTGACTGGGGGCCAACCGCCTTCCGAGGCCCAGGTCCGACAGGCAGCCTGGATTGCGCGGTGCGTCGG
What should I do in order to sort them according to gene name such as the following?
>final_120_R1_02356 AhpC/TSA family protein
GTGCCGACTCGCCGTCTGCAGGACATCAACGATCAACCGGTGGACGTCCCGGCTGCGACCGGAAGGACACACCTGCAGTTTCGGCGGTTCGCGGCCTGTCCGATCTGCCA
>final_106_00001 Glycerol kinase
GTGTCCGACGCCATCCTAGGAGAGCAATTGGCCGAGTCCTCGGATTTCATAGCCGCCATCGACCAGGGCACCACCAGCACCCGCTGCATGATCTTCGATCACCACGGTGCCGAGG
>final_100_00001 hypothetical protein
GTGTATAAGAGACAGGGCGTTGTCCAGCGCGATCTTCTGGCCGTCGCGATCGAGGGCCTGCCCGAGCGCCGACAGCTTGAGCGACACCTCGAGCGGTCGCACCCCGTCGCATGCGATATCGCCGCGGCGGC
>final_106_02271 hypothetical protein
ATGCGCACTACGATCGACCTCGATGACGACATACTGCGGGCGTTGAAACGACGCCAGCGCGAGGAGCGCAAAACGTTAGGGCAGCTCGCCTCCGAATTGCTTGCGCAAGCTCTGG
>final_107_00003 hypothetical protein
ATGCTGCCGCCTGGCTACCCGGTTGAACCACCGCCCGTGGCGCCGGGATATGCGCCGGCCGGATATCCGCCCTACCCCGCTACACCACCCGGGTACGGCCCGCCGGGTTA
>final_120_R1_02354 hypothetical protein
ATGACGATGGCTCGGGTGCGTCGCGGCACGGAACTGTTGTTGTCACCTCAGTCGCCGCCGGCCACCGGCGGGCTGATCGTGTTGACCGGTCTGCGGCTGTTGGCTGGGTTGATCTG
>final_106_02272 putative ribonuclease VapC43
GTGAGCGAAACCTTTGACGTCGATGTTCTGGTCCATGCGACGCACCGAGCCAGCCCGTTTCACGATAAGGCGAAGACGCTCGTTGAGCGATTCCTGGCTGGGCCAGGGCTGGTATATCTA
>final_107_00002 RDD family protein
ATGTCGGAGGTGGTGACCGGCGACGCCGTGGTGCTCGACGTACAGATCGCCCAGTTGCCGGTGCGCGCGGTCAGCGCGGTCATCGATATCACCATAATATTCATCGG
>final_120_02355 Soluble secreted antigen MPT53 precursor
GTGACTCATTCCCGTCTGATTGGCGCACTTACCGTAGTCGCAATTATCGTCACTGCATGTGGTTCGCAGCCGAAATCCCAGCCCGCAGTGGCACCTACCGGGGACGCGGCCGCT
>final_120_02357 transcriptional activator FtrB
ATGGCAGATCGGTCGGTGCGCCCGCTGCGGCATCTCGTTCATGCGGTGACTGGGGGCCAACCGCCTTCCGAGGCCCAGGTCCGACAGGCAGCCTGGATTGCGCGGTGCGTCGG
Comment