Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Bacteria Genomes
    Junior Member
    • Jul 2009
    • 8

    RAST annotation --> Artemis

    Hello!
    I used RAST to annotate my bacterial genomes and am now having trouble with the output files in Artemis. Artemis seems to be having trouble with the fact that I have multiple contigs and is putting all the genes on the first contig so they all end up on top of each other. If I reduce the annotation file to just the genes in the first contig though it isn't correctly putting the genes in the reading frames so there seems to be multiple problems going on. I have played around with all the different file outputs but they all seem to have problems.

    Anyone know how to fix this?

    Thanks!
  • nickloman
    Senior Member
    • Jul 2009
    • 355

    #2
    One option is to join the GenBank entries together with the "union" tool in EMBOSS.

    Also the annotation pipeline we developed locally will produce Artemis-ready files http://xbase.ac.uk/annotation/

    Comment

    • maubp
      Peter (Biopython etc)
      • Jul 2009
      • 1544

      #3
      What file format are you feeding into Artemis? GenBank? GFF3?

      Comment

      • Bacteria Genomes
        Junior Member
        • Jul 2009
        • 8

        #4
        I had this problem with GFF3 and GTF files. I have also opened the Genbank and EMBL files from RAST in Artemis but they only show the first contig listed (which for some reason RAST decided should be contig 11...) but these files in Artemis seem to not have all the annotation info that the GFF3 and GTF files have.

        Comment

        • Bacteria Genomes
          Junior Member
          • Jul 2009
          • 8

          #5
          Originally posted by nickloman View Post
          One option is to join the GenBank entries together with the "union" tool in EMBOSS.
          I looked into this and as far as I could tell it only works for fasta sequences.
          Any other suggestions for a similar solution?

          Comment

          • maubp
            Peter (Biopython etc)
            • Jul 2009
            • 1544

            #6
            Originally posted by Bacteria Genomes View Post
            I looked into this and as far as I could tell it only works for fasta sequences.
            No, it just defaults to output of fasta sequence. Try something like this:

            Code:
            union -sequence cat_three.gb -sformat genbank -outseq union_three.gb -osformat genbank -auto
            For more info,

            Code:
            tfm union
            OLD: That was the good news. The bad news is that it didn't keep the features (I have EMBOSS:6.3.1), which is probably an enhancement request...


            Would a short Biopython script to merge multiple GenBank records into a single record with the concatenated sequence and all the features be of interest?

            NEW: You must also explicitly ask union to keep the features, see below.
            Last edited by maubp; 11-15-2010, 03:50 AM. Reason: correction (thanks Nick)

            Comment

            • nickloman
              Senior Member
              • Jul 2009
              • 355

              #7
              There's a toggle if you want to keep the features, I think it is "-feature Y"

              See this script for a worked example:
              XBASE - a database and web framework for bacterial genomics and next-generation sequencing - nickloman/xbase

              Comment

              • maubp
                Peter (Biopython etc)
                • Jul 2009
                • 1544

                #8
                Originally posted by nickloman View Post
                There's a toggle if you want to keep the features, I think it is "-feature Y"
                You are right, thanks. e.g.
                Code:
                union -sequence cat_three.gb -sformat genbank -outseq union_three.gb -osformat genbank -feature Y -auto

                Comment

                • Bacteria Genomes
                  Junior Member
                  • Jul 2009
                  • 8

                  #9
                  That got around the problem! Thanks for your help!

                  Comment

                  • zhangju
                    Member
                    • May 2011
                    • 18

                    #10
                    I like to confirm the command to run make-art-file.py script.

                    <python make-art-file.py small1.art small2.art>

                    small1.art and small2.art are two genbank files to be concatenated for Artemis load at once.

                    Is this command right?

                    Comment

                    • nickloman
                      Senior Member
                      • Jul 2009
                      • 355

                      #11
                      Not exactly.

                      Concatenate your two Genbank files into a single file, e.g.

                      cat seq1.gb seq2.gb > both.gb

                      Then run make-art-file.py with input from the shell, e.g.

                      python make-art-file.py < both.gb

                      Hope that helps

                      Comment

                      • zhangju
                        Member
                        • May 2011
                        • 18

                        #12
                        I got lots of syntaxError at "stdin". I copy part of error report and part of input genbank file below. Could you take a look?

                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** NameError: name 'TLFRSINLIPFYSIMEYISGSPAITNALAFANVAGNIVIFIPLGIYLPLFKNDKRAIT' is not defined
                        (Pdb) *** NameError: name 'NLLFILIVSLFVEITQGLLGIGASDIDDVILNCLGGWIGILGYKFSLFILRDEKIVHT' is not defined
                        (Pdb) *** SyntaxError: EOL while scanning string literal (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** NameError: name 'IDGHGEDGDEEFISIQDCARKLIEYIDTNCNSQVFAMGGLSIGAQIVTEVLSQREKIT' is not defined
                        (Pdb) *** NameError: name 'DYAIIESVLVYPIRGTTALTVPVYKLFYGLIKKKWFAGMQAKTLCVPLDMFEQYYQDS' is not defined
                        (Pdb) *** NameError: name 'LKISRQSLINITLSNGNYNLNECIADTKTKVLIIVGENEVGIMKKSARLLHDKIPGSV' is not defined

                        Input file

                        LOCUS Ct_contig00020 105381 bp DNA linear 27-JUL-2011
                        DEFINITION Clostridium termitidis : Ct_contig00020
                        ACCESSION unknown
                        KEYWORDS .
                        COMMENT -
                        FEATURES Location/Qualifiers
                        source 1..105381
                        /mol_type="genomic DNA"
                        /db_xref="taxon:29371"
                        /organism="Clostridium termitidis"
                        gene 1..558
                        /locus_tag="Ct_00004390"
                        /gene_calling_method="Prodigal"
                        /note="IMG gene_oid=2504589426"
                        CDS 1..558
                        /locus_tag="or0446"
                        /translation="GIMNKRERIKTAFLYGAFICYILLLMKILLLSRISILGLFNNER
                        TLFRSINLIPFYSIMEYISGSPAITNALAFANVAGNIVIFIPLGIYLPLFKNDKRAIT
                        NLLFILIVSLFVEITQGLLGIGASDIDDVILNCLGGWIGILGYKFSLFILRDEKIVHT
                        AITILSVIIGLPVTLYFLFIIKMRF*"
                        /product="Glycopeptide antibiotics resistance protein"
                        /note="IMG gene_oid=2504589426"
                        gene 660..1406
                        /locus_tag="Ct_00004400"
                        /gene_calling_method="Prodigal"
                        /note="IMG gene_oid=2504589427"
                        CDS 660..1406
                        /locus_tag="or0447"
                        /translation="MIFKETPNKQMPTIILLHGGGLSSWSLNSIVEQLQSDFHIITPI
                        IDGHGEDGDEEFISIQDCARKLIEYIDTNCNSQVFAMGGLSIGAQIVTEVLSQREKIT
                        DYAIIESVLVYPIRGTTALTVPVYKLFYGLIKKKWFAGMQAKTLCVPLDMFEQYYQDS
                        LKISRQSLINITLSNGNYNLNECIADTKTKVLIIVGENEVGIMKKSARLLHDKIPGSV
                        LYTAPGMKHGELSLKYPLKYVDLLKSFFCK*"
                        /product="hypothetical protein"
                        /note="IMG gene_oid=2504589427"

                        Comment

                        • nickloman
                          Senior Member
                          • Jul 2009
                          • 355

                          #13
                          What platform are you running the script / EMBOSS on? It looks a bit like the line format of the file might be wrong. If on UNIX lines should end with the newline character \n.

                          Comment

                          • zhangju
                            Member
                            • May 2011
                            • 18

                            #14
                            I am using Linux Redhat. Is there a way I can attach the original file I am using for your reference. Since I copied from text editor, "\n"s may not show up.

                            Comment

                            • maubp
                              Peter (Biopython etc)
                              • Jul 2009
                              • 1544

                              #15
                              Originally posted by zhangju View Post
                              Input file
                              Code:
                              LOCUS       Ct_contig00020        105381 bp    DNA     linear   27-JUL-2011
                              DEFINITION  Clostridium termitidis : Ct_contig00020
                              ACCESSION   unknown
                              KEYWORDS    .
                              COMMENT     -
                              FEATURES             Location/Qualifiers
                                   source          1..105381
                                                   /mol_type="genomic DNA"
                                                   /db_xref="taxon:29371"
                                                   /organism="Clostridium termitidis"
                                   gene            1..558
                                                   /locus_tag="Ct_00004390"
                                                   /gene_calling_method="Prodigal"
                                                   /note="IMG gene_oid=2504589426"
                                   CDS             1..558
                                                   /locus_tag="or0446"
                                                   /translation="GIMNKRERIKTAFLYGAFICYILLLMKILLLSRISILGLFNNER
                                                   TLFRSINLIPFYSIMEYISGSPAITNALAFANVAGNIVIFIPLGIYLPLFKNDKRAIT
                                                   NLLFILIVSLFVEITQGLLGIGASDIDDVILNCLGGWIGILGYKFSLFILRDEKIVHT
                                                   AITILSVIIGLPVTLYFLFIIKMRF*"
                                                   /product="Glycopeptide antibiotics resistance protein"
                                                   /note="IMG gene_oid=2504589426"
                                   gene            660..1406
                                                   /locus_tag="Ct_00004400"
                                                   /gene_calling_method="Prodigal"
                                                   /note="IMG gene_oid=2504589427"
                                   CDS             660..1406
                                                   /locus_tag="or0447"
                                                   /translation="MIFKETPNKQMPTIILLHGGGLSSWSLNSIVEQLQSDFHIITPI
                                                   IDGHGEDGDEEFISIQDCARKLIEYIDTNCNSQVFAMGGLSIGAQIVTEVLSQREKIT
                                                   DYAIIESVLVYPIRGTTALTVPVYKLFYGLIKKKWFAGMQAKTLCVPLDMFEQYYQDS
                                                   LKISRQSLINITLSNGNYNLNECIADTKTKVLIIVGENEVGIMKKSARLLHDKIPGSV
                                                   LYTAPGMKHGELSLKYPLKYVDLLKSFFCK*"
                                                   /product="hypothetical protein"
                                                   /note="IMG gene_oid=2504589427"
                              Using the [ code ] text [ /code ] tags on the forum makes this kind of thing easier to read.

                              The stop codon is not normally included in the amino acid translation string (although it should be included within the CDS and gene co-ordinates). That probably isn't the problem though given the error messages.

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by SEQadmin2


                                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                                Here are nine questions we think about, in roughly the order they matter, before...
                                06-18-2026, 07:11 AM
                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-26-2026, 11:10 AM
                              0 responses
                              15 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-17-2026, 06:09 AM
                              0 responses
                              49 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-09-2026, 11:58 AM
                              0 responses
                              107 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              125 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...