SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Viewing multiple VCF files in Artemis coldturkey Bioinformatics 0 02-01-2012 01:45 AM
Artemis Robert Zagursky Bioinformatics 0 10-27-2011 01:20 PM
no visual in Artemis and IGV BastianFromm Bioinformatics 1 09-08-2011 09:22 AM
Artemis question A_Morozov Bioinformatics 1 06-20-2011 12:42 AM
Getting MIRA alignments into Artemis Hobbe Bioinformatics 3 04-19-2010 12:17 AM

Reply
 
Thread Tools
Old 11-11-2010, 12:41 PM   #1
Bacteria Genomes
Junior Member
 
Location: Ithaca, NY

Join Date: Jul 2009
Posts: 8
Default RAST annotation --> Artemis

Hello!
I used RAST to annotate my bacterial genomes and am now having trouble with the output files in Artemis. Artemis seems to be having trouble with the fact that I have multiple contigs and is putting all the genes on the first contig so they all end up on top of each other. If I reduce the annotation file to just the genes in the first contig though it isn't correctly putting the genes in the reading frames so there seems to be multiple problems going on. I have played around with all the different file outputs but they all seem to have problems.

Anyone know how to fix this?

Thanks!
Bacteria Genomes is offline   Reply With Quote
Old 11-11-2010, 02:17 PM   #2
nickloman
Senior Member
 
Location: Birmingham, UK

Join Date: Jul 2009
Posts: 356
Default

One option is to join the GenBank entries together with the "union" tool in EMBOSS.

Also the annotation pipeline we developed locally will produce Artemis-ready files http://xbase.ac.uk/annotation/
nickloman is offline   Reply With Quote
Old 11-12-2010, 12:01 AM   #3
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

What file format are you feeding into Artemis? GenBank? GFF3?
maubp is offline   Reply With Quote
Old 11-12-2010, 01:18 PM   #4
Bacteria Genomes
Junior Member
 
Location: Ithaca, NY

Join Date: Jul 2009
Posts: 8
Default

I had this problem with GFF3 and GTF files. I have also opened the Genbank and EMBL files from RAST in Artemis but they only show the first contig listed (which for some reason RAST decided should be contig 11...) but these files in Artemis seem to not have all the annotation info that the GFF3 and GTF files have.
Bacteria Genomes is offline   Reply With Quote
Old 11-14-2010, 06:46 PM   #5
Bacteria Genomes
Junior Member
 
Location: Ithaca, NY

Join Date: Jul 2009
Posts: 8
Default

Quote:
Originally Posted by nickloman View Post
One option is to join the GenBank entries together with the "union" tool in EMBOSS.
I looked into this and as far as I could tell it only works for fasta sequences.
Any other suggestions for a similar solution?
Bacteria Genomes is offline   Reply With Quote
Old 11-15-2010, 01:35 AM   #6
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Quote:
Originally Posted by Bacteria Genomes View Post
I looked into this and as far as I could tell it only works for fasta sequences.
No, it just defaults to output of fasta sequence. Try something like this:

Code:
union -sequence cat_three.gb -sformat genbank -outseq union_three.gb -osformat genbank -auto
For more info,

Code:
tfm union
OLD: That was the good news. The bad news is that it didn't keep the features (I have EMBOSS:6.3.1), which is probably an enhancement request...
http://lists.open-bio.org/pipermail/...er/004012.html

Would a short Biopython script to merge multiple GenBank records into a single record with the concatenated sequence and all the features be of interest?

NEW: You must also explicitly ask union to keep the features, see below.

Last edited by maubp; 11-15-2010 at 02:50 AM. Reason: correction (thanks Nick)
maubp is offline   Reply With Quote
Old 11-15-2010, 02:40 AM   #7
nickloman
Senior Member
 
Location: Birmingham, UK

Join Date: Jul 2009
Posts: 356
Default

There's a toggle if you want to keep the features, I think it is "-feature Y"

See this script for a worked example:
https://github.com/nickloman/xbase/b...ke-art-file.py
nickloman is offline   Reply With Quote
Old 11-15-2010, 02:49 AM   #8
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Quote:
Originally Posted by nickloman View Post
There's a toggle if you want to keep the features, I think it is "-feature Y"
You are right, thanks. e.g.
Code:
union -sequence cat_three.gb -sformat genbank -outseq union_three.gb -osformat genbank -feature Y -auto
maubp is offline   Reply With Quote
Old 11-15-2010, 08:51 AM   #9
Bacteria Genomes
Junior Member
 
Location: Ithaca, NY

Join Date: Jul 2009
Posts: 8
Default

That got around the problem! Thanks for your help!
Bacteria Genomes is offline   Reply With Quote
Old 08-16-2011, 01:33 PM   #10
zhangju
Member
 
Location: Winnipeg

Join Date: May 2011
Posts: 18
Default

I like to confirm the command to run make-art-file.py script.

<python make-art-file.py small1.art small2.art>

small1.art and small2.art are two genbank files to be concatenated for Artemis load at once.

Is this command right?
zhangju is offline   Reply With Quote
Old 08-17-2011, 08:10 AM   #11
nickloman
Senior Member
 
Location: Birmingham, UK

Join Date: Jul 2009
Posts: 356
Default

Not exactly.

Concatenate your two Genbank files into a single file, e.g.

cat seq1.gb seq2.gb > both.gb

Then run make-art-file.py with input from the shell, e.g.

python make-art-file.py < both.gb

Hope that helps
nickloman is offline   Reply With Quote
Old 08-17-2011, 08:27 AM   #12
zhangju
Member
 
Location: Winnipeg

Join Date: May 2011
Posts: 18
Default

I got lots of syntaxError at "stdin". I copy part of error report and part of input genbank file below. Could you take a look?

(Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
(Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
(Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
(Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
(Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
(Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
(Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
(Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
(Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
(Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
(Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
(Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
(Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
(Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
(Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
(Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
(Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
(Pdb) *** NameError: name 'TLFRSINLIPFYSIMEYISGSPAITNALAFANVAGNIVIFIPLGIYLPLFKNDKRAIT' is not defined
(Pdb) *** NameError: name 'NLLFILIVSLFVEITQGLLGIGASDIDDVILNCLGGWIGILGYKFSLFILRDEKIVHT' is not defined
(Pdb) *** SyntaxError: EOL while scanning string literal (<stdin>, line 1)
(Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
(Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
(Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
(Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
(Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
(Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
(Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
(Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
(Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
(Pdb) *** NameError: name 'IDGHGEDGDEEFISIQDCARKLIEYIDTNCNSQVFAMGGLSIGAQIVTEVLSQREKIT' is not defined
(Pdb) *** NameError: name 'DYAIIESVLVYPIRGTTALTVPVYKLFYGLIKKKWFAGMQAKTLCVPLDMFEQYYQDS' is not defined
(Pdb) *** NameError: name 'LKISRQSLINITLSNGNYNLNECIADTKTKVLIIVGENEVGIMKKSARLLHDKIPGSV' is not defined

Input file

LOCUS Ct_contig00020 105381 bp DNA linear 27-JUL-2011
DEFINITION Clostridium termitidis : Ct_contig00020
ACCESSION unknown
KEYWORDS .
COMMENT -
FEATURES Location/Qualifiers
source 1..105381
/mol_type="genomic DNA"
/db_xref="taxon:29371"
/organism="Clostridium termitidis"
gene 1..558
/locus_tag="Ct_00004390"
/gene_calling_method="Prodigal"
/note="IMG gene_oid=2504589426"
CDS 1..558
/locus_tag="or0446"
/translation="GIMNKRERIKTAFLYGAFICYILLLMKILLLSRISILGLFNNER
TLFRSINLIPFYSIMEYISGSPAITNALAFANVAGNIVIFIPLGIYLPLFKNDKRAIT
NLLFILIVSLFVEITQGLLGIGASDIDDVILNCLGGWIGILGYKFSLFILRDEKIVHT
AITILSVIIGLPVTLYFLFIIKMRF*"
/product="Glycopeptide antibiotics resistance protein"
/note="IMG gene_oid=2504589426"
gene 660..1406
/locus_tag="Ct_00004400"
/gene_calling_method="Prodigal"
/note="IMG gene_oid=2504589427"
CDS 660..1406
/locus_tag="or0447"
/translation="MIFKETPNKQMPTIILLHGGGLSSWSLNSIVEQLQSDFHIITPI
IDGHGEDGDEEFISIQDCARKLIEYIDTNCNSQVFAMGGLSIGAQIVTEVLSQREKIT
DYAIIESVLVYPIRGTTALTVPVYKLFYGLIKKKWFAGMQAKTLCVPLDMFEQYYQDS
LKISRQSLINITLSNGNYNLNECIADTKTKVLIIVGENEVGIMKKSARLLHDKIPGSV
LYTAPGMKHGELSLKYPLKYVDLLKSFFCK*"
/product="hypothetical protein"
/note="IMG gene_oid=2504589427"
zhangju is offline   Reply With Quote
Old 08-17-2011, 08:30 AM   #13
nickloman
Senior Member
 
Location: Birmingham, UK

Join Date: Jul 2009
Posts: 356
Default

What platform are you running the script / EMBOSS on? It looks a bit like the line format of the file might be wrong. If on UNIX lines should end with the newline character \n.
nickloman is offline   Reply With Quote
Old 08-17-2011, 08:35 AM   #14
zhangju
Member
 
Location: Winnipeg

Join Date: May 2011
Posts: 18
Default

I am using Linux Redhat. Is there a way I can attach the original file I am using for your reference. Since I copied from text editor, "\n"s may not show up.
zhangju is offline   Reply With Quote
Old 08-17-2011, 08:46 AM   #15
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Quote:
Originally Posted by zhangju View Post
Input file
Code:
LOCUS       Ct_contig00020        105381 bp    DNA     linear   27-JUL-2011
DEFINITION  Clostridium termitidis : Ct_contig00020
ACCESSION   unknown
KEYWORDS    .
COMMENT     -
FEATURES             Location/Qualifiers
     source          1..105381
                     /mol_type="genomic DNA"
                     /db_xref="taxon:29371"
                     /organism="Clostridium termitidis"
     gene            1..558
                     /locus_tag="Ct_00004390"
                     /gene_calling_method="Prodigal"
                     /note="IMG gene_oid=2504589426"
     CDS             1..558
                     /locus_tag="or0446"
                     /translation="GIMNKRERIKTAFLYGAFICYILLLMKILLLSRISILGLFNNER
                     TLFRSINLIPFYSIMEYISGSPAITNALAFANVAGNIVIFIPLGIYLPLFKNDKRAIT
                     NLLFILIVSLFVEITQGLLGIGASDIDDVILNCLGGWIGILGYKFSLFILRDEKIVHT
                     AITILSVIIGLPVTLYFLFIIKMRF*"
                     /product="Glycopeptide antibiotics resistance protein"
                     /note="IMG gene_oid=2504589426"
     gene            660..1406
                     /locus_tag="Ct_00004400"
                     /gene_calling_method="Prodigal"
                     /note="IMG gene_oid=2504589427"
     CDS             660..1406
                     /locus_tag="or0447"
                     /translation="MIFKETPNKQMPTIILLHGGGLSSWSLNSIVEQLQSDFHIITPI
                     IDGHGEDGDEEFISIQDCARKLIEYIDTNCNSQVFAMGGLSIGAQIVTEVLSQREKIT
                     DYAIIESVLVYPIRGTTALTVPVYKLFYGLIKKKWFAGMQAKTLCVPLDMFEQYYQDS
                     LKISRQSLINITLSNGNYNLNECIADTKTKVLIIVGENEVGIMKKSARLLHDKIPGSV
                     LYTAPGMKHGELSLKYPLKYVDLLKSFFCK*"
                     /product="hypothetical protein"
                     /note="IMG gene_oid=2504589427"
Using the [ code ] text [ /code ] tags on the forum makes this kind of thing easier to read.

The stop codon is not normally included in the amino acid translation string (although it should be included within the CDS and gene co-ordinates). That probably isn't the problem though given the error messages.
maubp is offline   Reply With Quote
Old 08-25-2011, 12:37 PM   #16
zhangju
Member
 
Location: Winnipeg

Join Date: May 2011
Posts: 18
Default

Can anyone provide me working example genbank files in order to compare the format with my files?
zhangju is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:49 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO