Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • RNA-seq assembly

    Hi

    I have a RNA-seq (Illumina platform) data without a reference sequence. Then the only option I have is to do a de novo assembly. Followed by gene prediction or mega blast to identified the content of my mRNA.

    However if the gene content is unknown. May I know if there is any software available to identified the unknown genes, or any pipeline that I can used.

    Say you have hypothetical proteins how am I going to determine that it is a hypothetical protein and what does it functions (any softwares).

    Thanks

  • #2
    To annotate an assembly, http://www.blast2go.org/ may help you.

    Comment


    • #3
      Thanks and appreciate for your reply.

      If I am not mistaken the blast2go are able to annotate the available genes from the database. If the genes or hypothetical proteins is not available in the database. Then what should I need to do to predict a new or novel gene? Thanks

      Comment


      • #4
        just to make sure that I got it right - you have an assembled transcriptome and you want to annotate it (?). I guess that for this you will always have to rely on other databases. I don't know about anything that would be able to tell you ab initio what kind of sequence would produce what kind of protein.

        In other words: you have to rely on existing knowledge. However - there's quite a lot around. Example blast2go does more or less the following:
        1. uses blast (in case of transcripts blastx) to search for similar transcripts which are at least somewhere somehow described (some may have experimental evidence, other are only based on predictions). In this step you will not only find the ones that are identical to known transcripts. It will also find cases where you have some similarity.
        2. Annotation then via GO, InterProScan, KEGG etc. (InterPro runs - I think - only on the ones which have a GO annotation - did not finish it due to the rather slow processing )
        3. Some Statistics

        Using blast2go you will be able to annotate quite some of your transcripts. Nonetheless - you will definitely have others which are not similar to any of the known ones (to be exact - they may be similar to a certain extent - but less than you specified by the threshold you chose for blastx).

        Now - if I got it right, you would like to do something with the remaining - unannotated transcript (?). Hm - I'm not really an expert for this. But I guess that "gene prediction" is not really what you need (as this programs are rather annotating a genome sequence - with the help of the transcripts you provide from your assembly - but as you don't have a genome sequence...). Well - there may be some programs which check transcripts directly - would be nice to know if you find something.

        An other possibility would be to search for protein domains (InterProScan etc - but this time on the sequences which were left out by blast2go). However - as fas as I know, you need to have protein sequences to do so. Means you need to translate your transcript into proteins (if not strand specific: six proteins - three frames from each strand). Just keep in mind:
        1. the domainscanners are again based on "similarity to known things"
        2. translating transcripts into proteins can be quite errorprone (imagine you had some intronic reads (eg either unspliced pre-mRNA or antisense transcripts): they will be incorporated into your transcript and during in silico translation it will mix up your protein sequence quite badly)


        In summary:
        I don't know about a "good" way of dealing with unknown transcripts which are not similar to anything that is known [well there are some - but not on the computer you would have to go to the bench ]

        Comment


        • #5
          Hi

          Really appreciate for detailed out my questions ^_^ That is exactly what I want to know > how to deal with the unknown transcripts.

          Well I have not done anything on the project yet. But I would kind of assuming if I have something different from the known database then what should I do...

          Share with me if there is any additional info ^_^ Have a nice day ahead ya

          Comment


          • #6
            was a pleasure

            Well I have not done anything on the project yet. But I would kind of assuming if I have something different from the known database then what should I do...
            I guess if it is totally different you'll have a hard time. Well - in principle you could translate into protein and do some crazy stuff maybe via the structure... but I think this is everything else than easy...

            well - if you just have few of them (or could filter based on whatever criteria down to few):
            1. back to the lab try to get/confirm the transcript (means: clone and sequence it the old way)
            2. still in the lab - use other methods to characterize it...
            3. some years later: either , , , , , or ...



            have a nice day - and in case you found a solution, let me know

            all the best

            Comment


            • #7
              WOW seems to be very challenging and a lot of stuff to be done if that happens!!!
              Will see what else I can do with it....

              Anyway, much appreciate for the sharing... THANKS!

              Comment


              • #8
                try hmmscan vs pfam

                maybe try a blast-based annotation first (as recommended above) and with your remaining (and low-confidence) transcripts, try a more sensitive hmm based annotation.

                first identify the coding regions and translate (e.g. using prodigal or similar), and run hmmscan vs pfam. novel proteins will likely have conserved domains, so even if they don't have "full-length" hits to known proteins, the domains themselves are informative.

                Comment


                • #9
                  By the way - beside the protein-similarity searches via blastx and domainscanners (forgot to note that blast2go is only trying to annotate protein coding transcripts - as GOs are only associated with proteins) I would also search for similarities on the nucleotide level (normal blast/blat - don't know about any software that is wrapping everything - if anyone knows - would be interesting) - I believe you will be able to annotate some of the ones that were not having any protein(-domain) similarity (some of them could also be rather intersting in biological meaning).

                  All the best (writing at the phone is tricky - sry for mistakes...)

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM
                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  18 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  22 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  17 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  49 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X