SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
What tools can convert sequence file from tabular format to fasta format? yangjianhunt Bioinformatics 5 03-26-2014 01:48 PM
Convert maf (multiple alignment file) to FASTA Avro1986 Bioinformatics 4 12-17-2012 11:12 AM
Convert WIG file into Fasta file kumardeep Bioinformatics 3 08-23-2012 04:56 AM
Text file editing_perl-GFF bioman1 Bioinformatics 0 07-05-2012 08:47 AM
How to convert diploid abi file into two fasta sequences? ymc Bioinformatics 1 04-28-2011 06:24 PM

Reply
 
Thread Tools
Old 03-28-2013, 09:53 AM   #1
Giorgio C
Member
 
Location: ITALY

Join Date: Oct 2010
Posts: 89
Default convert a text file in fasta with decollpasing

Hi all,

I have this file:

TGAGGTAGTAGATTGTATAGTT 424866
TAGCTTATCAGACTGATGTTGA 359141
TAGCTTATCAGACTGATGTTGAC 276052
TGAGGTAGTAGGTTGTATAGTT 268735
ACAGTAGTCTGCACATTGGTT 209280
ACAGTAGTCTGCACATTGGTTA 178652
TAGCTTATCAGACTGATGTTG 166159
TGAGGTAGTAGGTTGTGTGGTT 105275
TGAGGTAGTAGGTTGTATGGTT 102447
AGCAGCATTGTACAGGGCTATGA 91296
TGAGGTAGTAGGTTGTGTGGTTT 63300
TGAGGTAGTAGTTTGTACAGTT 61604
TGAGGTAGTAGATTGTATAGT 61492
TAGCACCATCTGAAATCGGTTA 60637
TTCAAGTAATCCAGGATAGGCT 52300
TGAGGTAGTAGATTGTATAGTTA 50905
TGAGGTAGTAGGTTGTATAGT 48150
TACAGTAGTCTGCACATTGGTT 47534
TCTACAGTCCGACGATC 45803
................

They are sequences and the numbers are the respective occurrences. I would like to convert that file in a fasta format, decollapsing the sequences and giving a name like that:

>Sample1_0
TGAGGTAGTAGATTGTATAGTT
>Sample1_1
TGAGGTAGTAGATTGTATAGTT
>Sample1_2
TGAGGTAGTAGATTGTATAGTT
.....
for 424866 times.
>Sample1_424666
TGAGGTAGTAGATTGTATAGTT

then
>Sample1_424667
TAGCTTATCAGACTGATGTTGA (the second sequences)

The same for the other sequences in series. Is there any scripts for that purpose?

Thanks in advance,
Giorgio
Giorgio C is offline   Reply With Quote
Old 03-28-2013, 12:07 PM   #2
vivek_
PhD Student
 
Location: Denmark

Join Date: Jul 2012
Posts: 164
Default

Code:
awk '{for(i=0;i<=($2-1);i++) print ">Sample"NR"_"i"\n"$1}' file.txt
This might work!
vivek_ is offline   Reply With Quote
Old 03-28-2013, 12:49 PM   #3
Giorgio C
Member
 
Location: ITALY

Join Date: Oct 2010
Posts: 89
Default

Thanks vivek it works great!
Giorgio C is offline   Reply With Quote
Old 03-29-2013, 10:48 AM   #4
maasha
Senior Member
 
Location: Denmark

Join Date: Apr 2009
Posts: 153
Default

This can also be done with Biopieces (www.biopieces.org):

Code:
read_tab -i in.tab -k SEQ,COUNT | duplicate_record -k COUNT | add_ident -k SEQ_NAME -p Sample1_ | write_fasta -x
maasha is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:58 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO