Seqanswers Leaderboard Ad

**rhinoceros** · 01-30-2015, 12:35 AM

First of all, you should not have sequence data in a word file. That is just asking for trouble. Second, you can easily edit the fasta headers in the *nix command line, given that your sequences are in a plain text file. So put your seqs into a proper text file and then post example headers and what you want them to look like..

**dariober** · 01-30-2015, 12:45 AM

Originally posted by aniben View Post

I have a word file with 222 pages worth of fasta genes...

Do you mean MS Word document? If so, save it as text file. When asked, use "LF only" as newline character. As an aside, don't keep sequence or any other sort of data in Word documents, you are going to make your bioinformatic life really complicated with that!

The phylogenetic program I plan to use doesn't like a long FASTA header all that much, so I have to change it to something less than 10 characters.

Are you sure only 10 characters are allowed? That sounds a bit weird... Anyway, if you really want to cut sequence names short you could use this crude way:

Code:

awk '{if($0 ~ /^>/){x= substr($0, 1, 10); print x} else {print $0}}' seq.fa.txt > shortnames.fa

Replace 10 with the desired number of chars to keep, seq.fa.txt is the text file saved as above.

**Brian Bushnell** · 01-30-2015, 01:20 AM

With BBTools:

bbrename.sh in=old.fasta out=new.fasta

That will rename the reads as 1, 2, 3, 4, ... 222.

You can also give a custom prefix if you want. The input has to be text format, not .doc.

**aniben** · 01-30-2015, 08:27 AM

Ok, thanks. Making it to a text file won't be all that much of a problem.

**GenoMax** · 01-30-2015, 08:45 AM

As you make these changes keep in mind that those 10 characters have to be unique (and logical enough for you to understand the trees afterwards) otherwise the phylogenetic program may not like it.

How many sequences do you have in that file? Are they all unique? It may be easier to name them 1,2,3 .. based on Brian's suggestion and then convert back to real names once you complete the analysis (photoshop?)

**GenoMax** · 01-30-2015, 08:50 AM

Originally posted by dariober View Post

Are you sure only 10 characters are allowed?

Some established phylogenetic programs (phylip) do have this requirement.

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 12 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

Fasta name changes

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News