Hi all,
I have this file:
TGAGGTAGTAGATTGTATAGTT 424866
TAGCTTATCAGACTGATGTTGA 359141
TAGCTTATCAGACTGATGTTGAC 276052
TGAGGTAGTAGGTTGTATAGTT 268735
ACAGTAGTCTGCACATTGGTT 209280
ACAGTAGTCTGCACATTGGTTA 178652
TAGCTTATCAGACTGATGTTG 166159
TGAGGTAGTAGGTTGTGTGGTT 105275
TGAGGTAGTAGGTTGTATGGTT 102447
AGCAGCATTGTACAGGGCTATGA 91296
TGAGGTAGTAGGTTGTGTGGTTT 63300
TGAGGTAGTAGTTTGTACAGTT 61604
TGAGGTAGTAGATTGTATAGT 61492
TAGCACCATCTGAAATCGGTTA 60637
TTCAAGTAATCCAGGATAGGCT 52300
TGAGGTAGTAGATTGTATAGTTA 50905
TGAGGTAGTAGGTTGTATAGT 48150
TACAGTAGTCTGCACATTGGTT 47534
TCTACAGTCCGACGATC 45803
................
They are sequences and the numbers are the respective occurrences. I would like to convert that file in a fasta format, decollapsing the sequences and giving a name like that:
>Sample1_0
TGAGGTAGTAGATTGTATAGTT
>Sample1_1
TGAGGTAGTAGATTGTATAGTT
>Sample1_2
TGAGGTAGTAGATTGTATAGTT
.....
for 424866 times.
>Sample1_424666
TGAGGTAGTAGATTGTATAGTT
then
>Sample1_424667
TAGCTTATCAGACTGATGTTGA (the second sequences)
The same for the other sequences in series. Is there any scripts for that purpose?
Thanks in advance,
Giorgio
I have this file:
TGAGGTAGTAGATTGTATAGTT 424866
TAGCTTATCAGACTGATGTTGA 359141
TAGCTTATCAGACTGATGTTGAC 276052
TGAGGTAGTAGGTTGTATAGTT 268735
ACAGTAGTCTGCACATTGGTT 209280
ACAGTAGTCTGCACATTGGTTA 178652
TAGCTTATCAGACTGATGTTG 166159
TGAGGTAGTAGGTTGTGTGGTT 105275
TGAGGTAGTAGGTTGTATGGTT 102447
AGCAGCATTGTACAGGGCTATGA 91296
TGAGGTAGTAGGTTGTGTGGTTT 63300
TGAGGTAGTAGTTTGTACAGTT 61604
TGAGGTAGTAGATTGTATAGT 61492
TAGCACCATCTGAAATCGGTTA 60637
TTCAAGTAATCCAGGATAGGCT 52300
TGAGGTAGTAGATTGTATAGTTA 50905
TGAGGTAGTAGGTTGTATAGT 48150
TACAGTAGTCTGCACATTGGTT 47534
TCTACAGTCCGACGATC 45803
................
They are sequences and the numbers are the respective occurrences. I would like to convert that file in a fasta format, decollapsing the sequences and giving a name like that:
>Sample1_0
TGAGGTAGTAGATTGTATAGTT
>Sample1_1
TGAGGTAGTAGATTGTATAGTT
>Sample1_2
TGAGGTAGTAGATTGTATAGTT
.....
for 424866 times.
>Sample1_424666
TGAGGTAGTAGATTGTATAGTT
then
>Sample1_424667
TAGCTTATCAGACTGATGTTGA (the second sequences)
The same for the other sequences in series. Is there any scripts for that purpose?
Thanks in advance,
Giorgio
Comment