Seqanswers Leaderboard Ad

**maubp** · 08-05-2015, 03:38 AM

I would try renaming all the contigs in the input FASTA file before calling Prokka,

e.g. NODE_1_length_41061_cov_17.678381 --> contig000001

**Carola Berger** · 08-06-2015, 05:31 AM

I also got an answer for the developer team.

So maybe this helps other people who run into the same problem in the future:

1) Try using the "--compliant" option (and do NOT use --centre)

2) Or try "--compliant --prefix XX"

3) Or try "--compliant --prefix XX --centre XX"

**feralBiologist** · 06-07-2016, 08:39 AM

I tried using "--compliant" and "--centre XX" alone and in combination and it didn't work.

**Brian Bushnell** · 06-07-2016, 08:42 AM

Try downloading BBMap and running this command:

rename.sh in=contigs.fa out=renamed.fa prefix=contig

**Markiyan** · 06-09-2016, 12:19 AM

We use [strain#]_AS[AS#]_CO[contig#]

It is extremely important to have both descriptive and consistent naming schema for the contigs/scaffolds for all downstream analysis.

Unfortunately NCBI names are a bit too long and usually have white spaces before the contig#, which makes them unsuitable as human and machine readable fasta_ID...

In our case we usually use the following contig names:

>[strain_name]_AS[AS#]_CO[contig#] {some optional description/orig name/etc}*

like:

>NRRL2338_AS1006_CO1

in case of scaffolds we put SC instead of CO,

>NRRL2338_AS1006_SC1

so if you has assembled DH10B and yours assembly #5 has multifasta has:
>contig0001
...
>contig0012

you do search and replace
>contig00
with
>DH10B_AS5_CO

PS: in the case of long strain names they may need a bit of shortening.

contigs renaming happens after assembly selection/polishing.

For the data downloaded from the NCBI/EMBL, it is done in KATE or similar text editor supporting regexp. Also one can use perl one liners (google - perl search and replace) if you are familiar with perl regexp.

Being consistent in sequence naming has a huge impact in all types of downstream analysis (blast, mapping, annotation).

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 12 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

Prokka -contig name to long even after --centre command

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News