Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Annotation for contigs from de novo assembly

    Hi,

    I want to annotate my assembled contigs (from de novo assembly). I used BLASTX and only got 10~20% percentage of hits(evalue=1e-5). Now all my differentially expressed contigs (genes) have no annotation. At least I want to know what these genes are, e.g, signaling, transmembrane etc.

    Thanks a lot!
    Victoria

  • #2
    I'd give Prokka a try:

    Comment


    • #3
      Provided Victoria is working with a prokaryotic genome

      NCBI has a eukaryotic annotation pipeline: http://www.ncbi.nlm.nih.gov/genome/a...n_euk/process/ and a prokaryotic one: https://www.ncbi.nlm.nih.gov/genome/annotation_prok/ If I recall right, you will have to make the sequence public though at some point in time if you use these.

      Other eukaryotic options (have not used myself):

      Pasa: http://pasa.sourceforge.net/
      Maker: http://www.gmod.org/wiki/MAKER

      Comment


      • #4
        I think Blast2GO would also be useful

        Comment


        • #5
          I've also had good experience with Blast2GO, it doesn't require installation and is quite easy to handle. Also, they updated the quite ugly colours of their pie charts

          Comment


          • #6
            Hi,

            Thank you for your reply. I understand that blast2go (see the below link) just used blast result so basically it won't provide more annotated contigs than BLASTX that I did, is it correct?



            The organism I want to annotate is the protist, Oxyrrhis Marina.

            Thank you!
            Victoria

            Comment


            • #7
              RAST annotation.
              Krishna

              Comment


              • #8
                Hi Victoria, I guess you could use several databases to increase your chances of annotation. What databases have you used? I don't have experience with protists but in general a good start could be to compare against GenBank and Uniprot's Swiss-Prot and TrEMBL protein databases. Have you tried a less conservative e-value? Also try to download similar species that are annotated to compare directly. This reference may help you

                Background Anopheles funestus is one of the primary vectors of human malaria, which causes a million deaths each year in sub-Saharan Africa. Few scientific resources are available to facilitate studies of this mosquito species and relatively little is known about its basic biology and evolution, making development and implementation of novel disease control efforts more difficult. The An. funestus genome has not been sequenced, so in order to facilitate genome-scale experimental biology, we have sequenced the adult female transcriptome of An. funestus from a newly founded colony in Burkina Faso, West Africa, using the Illumina GAIIx next generation sequencing platform. Methodology/Principal Findings We assembled short Illumina reads de novo using a novel approach involving iterative de novo assemblies and “target-based” contig clustering. We then selected a conservative set of 15,527 contigs through comparisons to four Dipteran transcriptomes as well as multiple functional and conserved protein domain databases. Comparison to the Anopheles gambiae immune system identified 339 contigs as putative immune genes, thus identifying a large portion of the immune system that can form the basis for subsequent studies of this important malaria vector. We identified 5,434 1∶1 orthologues between An. funestus and An. gambiae and found that among these 1∶1 orthologues, the protein sequence of those with putative immune function were significantly more diverged than the transcriptome as a whole. Short read alignments to the contig set revealed almost 367,000 genetic polymorphisms segregating in the An. funestus colony and demonstrated the utility of the assembled transcriptome for use in RNA-seq based measurements of gene expression. Conclusions/Significance We developed a pipeline that makes de novo transcriptome sequencing possible in virtually any organism at a very reasonable cost ($6,300 in sequencing costs in our case). We anticipate that our approach could be used to develop genomic resources in a diversity of systems for which full genome sequence is currently unavailable. Our An. funestus contig set and analytical results provide a valuable resource for future studies in this non-model, but epidemiologically critical, vector insect.


                Dave

                Comment


                • #9
                  You can try the Trinotate pipeline. It involves several tools (TransDecoder to get plausible ORFs, PFAM, HMMER, signalIP, tmHMM, RNAmmer) to obtain a quite complete annotation report. They give a lot of details on the website on how to use it.

                  Comment


                  • #10
                    Run a gene prediction tool (e.g. prodigal) over it, throw the proteins in InterproScan, and check if you get anything interesting for your analysis.

                    Might as well be good to know how long the contigs are.
                    Will not be of much use to annotate stuff, which is considerable less long than 900 bp.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM
                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    22 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    24 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    19 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-04-2024, 09:00 AM
                    0 responses
                    50 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X