SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Convert WIG file into Fasta file kumardeep Bioinformatics 3 08-23-2012 05:56 AM
delete sequences (fasta) from file ina-maria Bioinformatics 2 07-17-2012 01:04 AM
Align multiple sequences in tabular or fasta format pchiang Bioinformatics 7 07-01-2011 11:18 PM
How to convert diploid abi file into two fasta sequences? ymc Bioinformatics 1 04-28-2011 07:24 PM
Concatenating sequences doxologist SOLiD 1 05-28-2009 07:24 PM

Reply
 
Thread Tools
Old 12-03-2012, 03:01 PM   #1
cdlam
Junior Member
 
Location: East Coast

Join Date: Oct 2012
Posts: 7
Default Concatenating Sequences Within a single Fasta File

Hi all,
I need to concatenate a bunch of sequences in a FASTA file. I have a file of extracted introns and would like to essentially splice them all together for use in a program. Is there any way to do either of these using perl (preferably) or python (if necessary):

1. Join all the introns of a single gene, preserving the FASTA heading for that gene.

> Gene 1 Intron 1
GTACGCC....CTGATAGAG
>Gene 1 Intron 2
GTCCAGGAC.....CTGAGTAAG

Becomes
> Gene 1 Intron 1
GTACGCC....CTGATAGAGGTCCAGGAC.....CTGAGTAAG

or
2. Join a number of introns together (not accounting for what genes they came from) under a non-specific FASTA formatted heading?

Basically I want to splice together a bunch of intron sequences like they were exons so that I can run them through a program that doesn't like how short they are. The first way would be the most biologically relevant and useful for my purposes, but if it can't be done I can live with it haha. Any help would be greatly appreciated. Thanks a lot!
cdlam is offline   Reply With Quote
Old 12-04-2012, 07:51 AM   #2
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

I don't think that there is a program out there that will do what you want. In part this is because FastA headers are not standardized. Basically all you need for FastA is the '>'. So it would be hard to create a general purpose program that would combine FastA sequences together based on some regular yet arbitrary criteria. That said it would be trivial (for a programmer, at least) to create a one-shot specific program that would do the combining.

Now that I think of it, such an program would be a good one to give to a beginning programmer. Straight-forward yet not so trivial as to be boring.

Last edited by westerman; 12-04-2012 at 07:54 AM. Reason: Added words, "one-shot specific"
westerman is offline   Reply With Quote
Old 12-04-2012, 08:36 AM   #3
JackieBadger
Senior Member
 
Location: Halifax, Nova Scotia

Join Date: Mar 2009
Posts: 381
Default

A little fiddly but you can use the text manipulation tool in Galaxy.

Convert your fastas to tabula formats....and use the cut column and paste files side by side to get your sequences in the same file.... you can then merge columns and convert back to fast
JackieBadger is offline   Reply With Quote
Old 12-04-2012, 08:54 AM   #4
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

Jackie: I'll agree that conversion to tabular is a good first step -- and this can be done via the command line as well using the FastX tools -- but I don't see how a merger could be done automatically. Let's say after the conversion I have

Gene 1 Intron 1 <tab> GTACGCC....CTGATAGAG
Gene 1 Intron 2 <tab> GTCCAGGAC.....CTGAGTAAG
Gene 2 Intron 1 <tab> sequence
Gene 3 Intron 1 <tab> sequence
Gene 3 Intron 2 <tab> sequence
Gene 3 Intron 3 <tab> sequence
etc.

How to pull all of the Gene1, Gene2, Gene3, etc. sequences together without some programming? I'm not familiar enough with Galaxy to follow your "cut column and paste files" suggestion but if it is anything like the normal Unix 'cut' and 'paste' commands then I just don't see how to do the cut'n'paste automatically.

Perhaps if the tabular file becomes:

Gene<tab>1<tab>Intron<tab>1<tab>Sequence
Gene<tab>1<tab>Intron<tab>2<tab>Sequence
etc.

Then maybe cut'n'paste would work.
westerman is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:55 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO