Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • RAST annotation --> Artemis

    Hello!
    I used RAST to annotate my bacterial genomes and am now having trouble with the output files in Artemis. Artemis seems to be having trouble with the fact that I have multiple contigs and is putting all the genes on the first contig so they all end up on top of each other. If I reduce the annotation file to just the genes in the first contig though it isn't correctly putting the genes in the reading frames so there seems to be multiple problems going on. I have played around with all the different file outputs but they all seem to have problems.

    Anyone know how to fix this?

    Thanks!

  • #2
    One option is to join the GenBank entries together with the "union" tool in EMBOSS.

    Also the annotation pipeline we developed locally will produce Artemis-ready files http://xbase.ac.uk/annotation/

    Comment


    • #3
      What file format are you feeding into Artemis? GenBank? GFF3?

      Comment


      • #4
        I had this problem with GFF3 and GTF files. I have also opened the Genbank and EMBL files from RAST in Artemis but they only show the first contig listed (which for some reason RAST decided should be contig 11...) but these files in Artemis seem to not have all the annotation info that the GFF3 and GTF files have.

        Comment


        • #5
          Originally posted by nickloman View Post
          One option is to join the GenBank entries together with the "union" tool in EMBOSS.
          I looked into this and as far as I could tell it only works for fasta sequences.
          Any other suggestions for a similar solution?

          Comment


          • #6
            Originally posted by Bacteria Genomes View Post
            I looked into this and as far as I could tell it only works for fasta sequences.
            No, it just defaults to output of fasta sequence. Try something like this:

            Code:
            union -sequence cat_three.gb -sformat genbank -outseq union_three.gb -osformat genbank -auto
            For more info,

            Code:
            tfm union
            OLD: That was the good news. The bad news is that it didn't keep the features (I have EMBOSS:6.3.1), which is probably an enhancement request...


            Would a short Biopython script to merge multiple GenBank records into a single record with the concatenated sequence and all the features be of interest?

            NEW: You must also explicitly ask union to keep the features, see below.
            Last edited by maubp; 11-15-2010, 03:50 AM. Reason: correction (thanks Nick)

            Comment


            • #7
              There's a toggle if you want to keep the features, I think it is "-feature Y"

              See this script for a worked example:
              XBASE - a database and web framework for bacterial genomics and next-generation sequencing - nickloman/xbase

              Comment


              • #8
                Originally posted by nickloman View Post
                There's a toggle if you want to keep the features, I think it is "-feature Y"
                You are right, thanks. e.g.
                Code:
                union -sequence cat_three.gb -sformat genbank -outseq union_three.gb -osformat genbank -feature Y -auto

                Comment


                • #9
                  That got around the problem! Thanks for your help!

                  Comment


                  • #10
                    I like to confirm the command to run make-art-file.py script.

                    <python make-art-file.py small1.art small2.art>

                    small1.art and small2.art are two genbank files to be concatenated for Artemis load at once.

                    Is this command right?

                    Comment


                    • #11
                      Not exactly.

                      Concatenate your two Genbank files into a single file, e.g.

                      cat seq1.gb seq2.gb > both.gb

                      Then run make-art-file.py with input from the shell, e.g.

                      python make-art-file.py < both.gb

                      Hope that helps

                      Comment


                      • #12
                        I got lots of syntaxError at "stdin". I copy part of error report and part of input genbank file below. Could you take a look?

                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** NameError: name 'TLFRSINLIPFYSIMEYISGSPAITNALAFANVAGNIVIFIPLGIYLPLFKNDKRAIT' is not defined
                        (Pdb) *** NameError: name 'NLLFILIVSLFVEITQGLLGIGASDIDDVILNCLGGWIGILGYKFSLFILRDEKIVHT' is not defined
                        (Pdb) *** SyntaxError: EOL while scanning string literal (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** SyntaxError: invalid syntax (<stdin>, line 1)
                        (Pdb) *** NameError: name 'IDGHGEDGDEEFISIQDCARKLIEYIDTNCNSQVFAMGGLSIGAQIVTEVLSQREKIT' is not defined
                        (Pdb) *** NameError: name 'DYAIIESVLVYPIRGTTALTVPVYKLFYGLIKKKWFAGMQAKTLCVPLDMFEQYYQDS' is not defined
                        (Pdb) *** NameError: name 'LKISRQSLINITLSNGNYNLNECIADTKTKVLIIVGENEVGIMKKSARLLHDKIPGSV' is not defined

                        Input file

                        LOCUS Ct_contig00020 105381 bp DNA linear 27-JUL-2011
                        DEFINITION Clostridium termitidis : Ct_contig00020
                        ACCESSION unknown
                        KEYWORDS .
                        COMMENT -
                        FEATURES Location/Qualifiers
                        source 1..105381
                        /mol_type="genomic DNA"
                        /db_xref="taxon:29371"
                        /organism="Clostridium termitidis"
                        gene 1..558
                        /locus_tag="Ct_00004390"
                        /gene_calling_method="Prodigal"
                        /note="IMG gene_oid=2504589426"
                        CDS 1..558
                        /locus_tag="or0446"
                        /translation="GIMNKRERIKTAFLYGAFICYILLLMKILLLSRISILGLFNNER
                        TLFRSINLIPFYSIMEYISGSPAITNALAFANVAGNIVIFIPLGIYLPLFKNDKRAIT
                        NLLFILIVSLFVEITQGLLGIGASDIDDVILNCLGGWIGILGYKFSLFILRDEKIVHT
                        AITILSVIIGLPVTLYFLFIIKMRF*"
                        /product="Glycopeptide antibiotics resistance protein"
                        /note="IMG gene_oid=2504589426"
                        gene 660..1406
                        /locus_tag="Ct_00004400"
                        /gene_calling_method="Prodigal"
                        /note="IMG gene_oid=2504589427"
                        CDS 660..1406
                        /locus_tag="or0447"
                        /translation="MIFKETPNKQMPTIILLHGGGLSSWSLNSIVEQLQSDFHIITPI
                        IDGHGEDGDEEFISIQDCARKLIEYIDTNCNSQVFAMGGLSIGAQIVTEVLSQREKIT
                        DYAIIESVLVYPIRGTTALTVPVYKLFYGLIKKKWFAGMQAKTLCVPLDMFEQYYQDS
                        LKISRQSLINITLSNGNYNLNECIADTKTKVLIIVGENEVGIMKKSARLLHDKIPGSV
                        LYTAPGMKHGELSLKYPLKYVDLLKSFFCK*"
                        /product="hypothetical protein"
                        /note="IMG gene_oid=2504589427"

                        Comment


                        • #13
                          What platform are you running the script / EMBOSS on? It looks a bit like the line format of the file might be wrong. If on UNIX lines should end with the newline character \n.

                          Comment


                          • #14
                            I am using Linux Redhat. Is there a way I can attach the original file I am using for your reference. Since I copied from text editor, "\n"s may not show up.

                            Comment


                            • #15
                              Originally posted by zhangju View Post
                              Input file
                              Code:
                              LOCUS       Ct_contig00020        105381 bp    DNA     linear   27-JUL-2011
                              DEFINITION  Clostridium termitidis : Ct_contig00020
                              ACCESSION   unknown
                              KEYWORDS    .
                              COMMENT     -
                              FEATURES             Location/Qualifiers
                                   source          1..105381
                                                   /mol_type="genomic DNA"
                                                   /db_xref="taxon:29371"
                                                   /organism="Clostridium termitidis"
                                   gene            1..558
                                                   /locus_tag="Ct_00004390"
                                                   /gene_calling_method="Prodigal"
                                                   /note="IMG gene_oid=2504589426"
                                   CDS             1..558
                                                   /locus_tag="or0446"
                                                   /translation="GIMNKRERIKTAFLYGAFICYILLLMKILLLSRISILGLFNNER
                                                   TLFRSINLIPFYSIMEYISGSPAITNALAFANVAGNIVIFIPLGIYLPLFKNDKRAIT
                                                   NLLFILIVSLFVEITQGLLGIGASDIDDVILNCLGGWIGILGYKFSLFILRDEKIVHT
                                                   AITILSVIIGLPVTLYFLFIIKMRF*"
                                                   /product="Glycopeptide antibiotics resistance protein"
                                                   /note="IMG gene_oid=2504589426"
                                   gene            660..1406
                                                   /locus_tag="Ct_00004400"
                                                   /gene_calling_method="Prodigal"
                                                   /note="IMG gene_oid=2504589427"
                                   CDS             660..1406
                                                   /locus_tag="or0447"
                                                   /translation="MIFKETPNKQMPTIILLHGGGLSSWSLNSIVEQLQSDFHIITPI
                                                   IDGHGEDGDEEFISIQDCARKLIEYIDTNCNSQVFAMGGLSIGAQIVTEVLSQREKIT
                                                   DYAIIESVLVYPIRGTTALTVPVYKLFYGLIKKKWFAGMQAKTLCVPLDMFEQYYQDS
                                                   LKISRQSLINITLSNGNYNLNECIADTKTKVLIIVGENEVGIMKKSARLLHDKIPGSV
                                                   LYTAPGMKHGELSLKYPLKYVDLLKSFFCK*"
                                                   /product="hypothetical protein"
                                                   /note="IMG gene_oid=2504589427"
                              Using the [ code ] text [ /code ] tags on the forum makes this kind of thing easier to read.

                              The stop codon is not normally included in the amino acid translation string (although it should be included within the CDS and gene co-ordinates). That probably isn't the problem though given the error messages.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              7 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              7 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              66 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X