Seqanswers Leaderboard Ad

**rhinoceros** · 09-21-2013, 02:09 AM

AFAIK, GenBank identifiers are not linked to geographical metadata, so unless you have the said data in e.g. a two column table (gi - location), no.

**JamieHeather** · 09-21-2013, 10:18 AM

Not sure if I understand, but if you already have separate fasta files that need renaming, something like this would do it:

Code:

sed 's/>.*/>Yellowstone/' INFILE.fa > OUTFILE.fa

**nickv** · 09-21-2013, 11:14 AM

Thank you. To be more clear, I have 1 .fasta file with multiple sequence alignments. I want to rename all the headers with the names of certain geographical localities, depending on the isolates. Now that I am writing this, it doesn't seem possible.

Kindly,
Nick

**JamieHeather** · 09-21-2013, 01:04 PM

Do you have the geographical data in some format?

**guptavipin142** · 05-14-2014, 10:34 PM

This seems a big trouble some time.
I am also facing the same problem. I have 7.7 GB FQ file and want to rename their header completely.
Pls suggest.

**JamieHeather** · 05-15-2014, 12:30 AM

From what to what?

**guptavipin142** · 05-16-2014, 02:34 AM

i have illumina reads by name of
>FCC1047ACXX:1:1101:1991:2224#GTTCGACA/1 1 1
>FCC1047ACXX:1:1101:1991:2224#GTTCGACA/2 1 1
I want to rename all these with "sequence 1"
I am aligning these reads over to a genome using MUMMer but it is showing error
Duplicate read....ignored.

Pls suggest ....
Thanks in advance....

**JamieHeather** · 05-16-2014, 02:43 AM

Are you sure you have a fastq file? IDs for a fastq should start with an '@' character. Yours appear to start with a '>', which is the format for fasta files.

I'm assuming you want to change each line to be numbered sequentially, not change them all to "sequence 1", as then they would all presumably count as duplicate reads?

These are all do-able using relatively simple commands (particularly sed and/or awk), but we just need to know exactly what it is you're trying to do to what before we can suggest some. Maybe if you give us a sample of what your data looks like now, and how you want it to look?

**guptavipin142** · 05-20-2014, 01:39 AM

Yes Jamie you are right....
I have fasta sequences and they appear like this

>FCC1047ACXX:1:1101:1991:2224#GTTCGACA/1 1 1
AGAGCAGATCCTAACAATCCCTGGAATACCCCTATATTT
>FCC1047ACXX:1:1101:1991:2224#GTTCGACA/2 1 1
GAAATCAGGAAAATGGAGAATGTTAATAGATTTTAGAGAA

and i want to rename them like this

> sequence 1
AGAGCAGATCCTAACAATCCCTGGAATACCCCTATATTT

> sequence 2
GAAATCAGGAAAATGGAGAATGTTAATAGATTTTAGAGAA

**Brian Bushnell** · 05-20-2014, 09:12 AM

You can do that with BBTools.

bbrename.sh in=reads.fasta out=renamed.fasta prefix=sequence

But, note that your reads are paired and interleaved, so I suggest not remaining them "sequence_1" then "sequence_2", but rather "sequence_1 /1" and "sequence_1 /2" then "sequence_2 /1" and "sequence_2 /2", etc, which will keep the pairing information for downstream programs to use. To do that, you would just tell the tool that the reads are interleaved, like this:

bbrename.sh in=reads.fasta out=renamed.fasta prefix=sequence int=t

**guptavipin142** · 05-20-2014, 10:12 PM

Hello Brian,

Can you also share the link for BBTools.

Thanks

**Brian Bushnell** · 05-20-2014, 10:49 PM

Certainly; it's here:

BBMap

https://sourceforge.net/projects/bbmap/

Download BBMap for free. BBMap short read aligner, and other bioinformatic tools. This package includes BBMap, a short read aligner, as well as various other bioinformatic tools. It is written in pure Java, can run on any platform, and has no dependencies other than Java being installed (compiled for Java 6 and higher).

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 13 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Completely renaming Fasta headers

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News