I have a word file with 222 pages worth of fasta genes... that adds up. The phylogenetic program I plan to use doesn't like a long FASTA header all that much, so I have to change it to something less than 10 characters. considering the large number of genes I have in the document, I would like to know a way to change all these names in a quicker manner than just manually changing each and every one. I'm aware that building a program can solve this, but I only have introductory experience in python. Any advice? Thanks!
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
-
First of all, you should not have sequence data in a word file. That is just asking for trouble. Second, you can easily edit the fasta headers in the *nix command line, given that your sequences are in a plain text file. So put your seqs into a proper text file and then post example headers and what you want them to look like..savetherhino.org
-
Originally posted by aniben View PostI have a word file with 222 pages worth of fasta genes...
The phylogenetic program I plan to use doesn't like a long FASTA header all that much, so I have to change it to something less than 10 characters.
Code:awk '{if($0 ~ /^>/){x= substr($0, 1, 10); print x} else {print $0}}' seq.fa.txt > shortnames.fa
Comment
-
As you make these changes keep in mind that those 10 characters have to be unique (and logical enough for you to understand the trees afterwards) otherwise the phylogenetic program may not like it.
How many sequences do you have in that file? Are they all unique? It may be easier to name them 1,2,3 .. based on Brian's suggestion and then convert back to real names once you complete the analysis (photoshop?)
Comment
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Yesterday, 11:49 AM
|
0 responses
15 views
0 likes
|
Last Post
by seqadmin
Yesterday, 11:49 AM
|
||
Started by seqadmin, 04-24-2024, 08:47 AM
|
0 responses
16 views
0 likes
|
Last Post
by seqadmin
04-24-2024, 08:47 AM
|
||
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
61 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
60 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
Comment