Seqanswers Leaderboard Ad

**dariober** · 08-01-2012, 12:16 AM

Hi,

This python script should do it. Say your tab separated file is tabseq.tsv:

Code:

A       B       3233    223322  TAGGGCCTTAGGAAGCCTAA
C       D       3234    222334  AGGTAACCGATAGAGGTCCA

Column 5 is the sequence, one or more of the other columns to be used as header.

Code:

python tab2fasta.py tabseq.tsv 5 1 2 4  > tabseq.fa

Output (tabseq.fa) will be:

Code:

>A_B_223322
TAGGGCCTTAGGAAGCCTAA
>C_D_222334
AGGTAACCGATAGAGGTCCA

Here's the code for tab2fasta.py:

Code:

#!/usr/local/bin/python

docstring= """
DESCRIPTION
    Convert tabular to FASTA

USAGE:
    python tab2fasta.py <tab-file> <sequence column> <header column 1> <header column 2> <header column n>  > <outfile>
"""

import sys
if len(sys.argv) < 4:
    sys.exit('\nThree or more arguments required%s' %(docstring))
    
infile= open(sys.argv[1])
seqix= int(sys.argv[2]) - 1 
headerix= sys.argv[3:]
headerix= [(int(x) - 1) for x in headerix]

for line in infile:
    line= line.strip().split('\t')
    header= '>' + '_'.join([line[i] for i in headerix])
    print(header)
    print(line[seqix])

infile.close()

I've done minimal testing so make sure it does what you want!

Good luck
Dario

**essvee** · 08-01-2012, 05:40 AM

or if your file is tabseq.tsv:

Code:

A       B       3233    223322  TAGGGCCTTAGGAAGCCTAA
C       D       3234    222334  AGGTAACCGATAGAGGTCCA

you can use awk to do this easily:

Code:

awk '{print ">"$1"_"$2"_"$3"_"$4"\n"$5}' tabseq.tsv > seqs.fa

The $1, $2, etc are the column numbers, you can change these to whichever order you'd like, for example, for the other format:

Code:

TAGGAACCATTAGCCAACAA  88889
GATTAGGCCCAAATGCAAAG  7799

you could do:

Code:

awk '{print ">"$2"\n"$1}' tabseq.tsv > seqs.fa

**yangjianhunt** · 08-01-2012, 05:41 AM

You are tremendous help!

Hi Dario,

I cannot thank you enough.
I will test the code, modify it if necessary.
Yesterday I was watching the MIT opencourse on beginner programing -they use python as the example language. It's going to take at least a month to learn programming by it. I'd like to learn it. But I want to get the immediate problem solved!

Regards,
Jian

**yangjianhunt** · 08-01-2012, 05:46 AM

Thanks, essvee!

Wow, The awk solution is so simple and elegant!
I will try these as well.

I've used a few times of awk-but only through google. I never tried to fully understand the awk language. It's great for parsing!

Thank you thank you thank you!

Jian

**musta1234** · 03-26-2014, 01:48 PM

This is awesome.... thanks for the awk and python scripts

Mustapha

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 17 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 46 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

What tools can convert sequence file from tabular format to fasta format?

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News