Seqanswers Leaderboard Ad

**dariober** · 08-01-2012, 12:16 AM

Hi,

This python script should do it. Say your tab separated file is tabseq.tsv:

Code:

A       B       3233    223322  TAGGGCCTTAGGAAGCCTAA
C       D       3234    222334  AGGTAACCGATAGAGGTCCA

Column 5 is the sequence, one or more of the other columns to be used as header.

Code:

python tab2fasta.py tabseq.tsv 5 1 2 4  > tabseq.fa

Output (tabseq.fa) will be:

Code:

>A_B_223322
TAGGGCCTTAGGAAGCCTAA
>C_D_222334
AGGTAACCGATAGAGGTCCA

Here's the code for tab2fasta.py:

Code:

#!/usr/local/bin/python

docstring= """
DESCRIPTION
    Convert tabular to FASTA

USAGE:
    python tab2fasta.py <tab-file> <sequence column> <header column 1> <header column 2> <header column n>  > <outfile>
"""

import sys
if len(sys.argv) < 4:
    sys.exit('\nThree or more arguments required%s' %(docstring))
    
infile= open(sys.argv[1])
seqix= int(sys.argv[2]) - 1 
headerix= sys.argv[3:]
headerix= [(int(x) - 1) for x in headerix]

for line in infile:
    line= line.strip().split('\t')
    header= '>' + '_'.join([line[i] for i in headerix])
    print(header)
    print(line[seqix])

infile.close()

I've done minimal testing so make sure it does what you want!

Good luck
Dario

**essvee** · 08-01-2012, 05:40 AM

or if your file is tabseq.tsv:

Code:

A       B       3233    223322  TAGGGCCTTAGGAAGCCTAA
C       D       3234    222334  AGGTAACCGATAGAGGTCCA

you can use awk to do this easily:

Code:

awk '{print ">"$1"_"$2"_"$3"_"$4"\n"$5}' tabseq.tsv > seqs.fa

The $1, $2, etc are the column numbers, you can change these to whichever order you'd like, for example, for the other format:

Code:

TAGGAACCATTAGCCAACAA  88889
GATTAGGCCCAAATGCAAAG  7799

you could do:

Code:

awk '{print ">"$2"\n"$1}' tabseq.tsv > seqs.fa

**yangjianhunt** · 08-01-2012, 05:41 AM

You are tremendous help!

Hi Dario,

I cannot thank you enough.
I will test the code, modify it if necessary.
Yesterday I was watching the MIT opencourse on beginner programing -they use python as the example language. It's going to take at least a month to learn programming by it. I'd like to learn it. But I want to get the immediate problem solved!

Regards,
Jian

**yangjianhunt** · 08-01-2012, 05:46 AM

Thanks, essvee!

Wow, The awk solution is so simple and elegant!
I will try these as well.

I've used a few times of awk-but only through google. I never tried to fully understand the awk language. It's great for parsing!

Thank you thank you thank you!

Jian

**musta1234** · 03-26-2014, 01:48 PM

This is awesome.... thanks for the awk and python scripts

Mustapha

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Today, 08:47 AM	0 responses 10 views 0 likes	Last Post by seqadmin Today, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 57 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

What tools can convert sequence file from tabular format to fasta format?

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News