Seqanswers Leaderboard Ad

**westerman** · 12-04-2012, 07:51 AM

I don't think that there is a program out there that will do what you want. In part this is because FastA headers are not standardized. Basically all you need for FastA is the '>'. So it would be hard to create a general purpose program that would combine FastA sequences together based on some regular yet arbitrary criteria. That said it would be trivial (for a programmer, at least) to create a one-shot specific program that would do the combining.

Now that I think of it, such an program would be a good one to give to a beginning programmer. Straight-forward yet not so trivial as to be boring.

**JackieBadger** · 12-04-2012, 08:36 AM

A little fiddly but you can use the text manipulation tool in Galaxy.

Convert your fastas to tabula formats....and use the cut column and paste files side by side to get your sequences in the same file.... you can then merge columns and convert back to fast

**westerman** · 12-04-2012, 08:54 AM

Jackie: I'll agree that conversion to tabular is a good first step -- and this can be done via the command line as well using the FastX tools -- but I don't see how a merger could be done automatically. Let's say after the conversion I have

Gene 1 Intron 1 <tab> GTACGCC....CTGATAGAG
Gene 1 Intron 2 <tab> GTCCAGGAC.....CTGAGTAAG
Gene 2 Intron 1 <tab> sequence
Gene 3 Intron 1 <tab> sequence
Gene 3 Intron 2 <tab> sequence
Gene 3 Intron 3 <tab> sequence
etc.

How to pull all of the Gene1, Gene2, Gene3, etc. sequences together without some programming? I'm not familiar enough with Galaxy to follow your "cut column and paste files" suggestion but if it is anything like the normal Unix 'cut' and 'paste' commands then I just don't see how to do the cut'n'paste automatically.

Perhaps if the tabular file becomes:

Gene<tab>1<tab>Intron<tab>1<tab>Sequence
Gene<tab>1<tab>Intron<tab>2<tab>Sequence
etc.

Then maybe cut'n'paste would work.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 57 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 51 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 56 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Concatenating Sequences Within a single Fasta File

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News