Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • RNA-Seq annotation

    Hello,

    I need to know is there any software that can take input of Reference genome and RNA-Seq reads and output genomic annotation from these RNA reads?

    Thanks

  • #2
    By that, do you mean is there a program that will align RNA-seq reads against a reference genome (returning the aligned position and any mismatches), or do you want to add a particular annotation (presumably associated gene or transcript) to already aligned reads? I assume that what you want is the former, in which case you can use tophat/bowtie, novoalign, bwa, etc.

    Comment


    • #3
      If you are new to NGS analysis, I could also recommend using a commercial platform such as CLC or DNANexus for RNA-Seq analysis. CLC is a local workbench with yearly license fees (very reasonable for academic or non-profits), while Nexus is cloud based and a pay per GB model.

      Service providers are also an option as well to get consulting work done.

      If you have the informatics chops, DESEQ, tuxedo suite, and several other packages align and help you annotate and evaluate expression data.
      Justin H. Johnson | Twitter: @BioInfo | LinkedIn: http://bit.ly/LIJHJ | EdgeBio

      Comment


      • #4
        Originally posted by dpryan View Post
        By that, do you mean is there a program that will align RNA-seq reads against a reference genome (returning the aligned position and any mismatches), or do you want to add a particular annotation (presumably associated gene or transcript) to already aligned reads? I assume that what you want is the former, in which case you can use tophat/bowtie, novoalign, bwa, etc.
        Thanks Dpryan for the reply..
        Actually I have the reads already aligned, I need to add annotations to the aligned reads that will include: ORFs, splice variants, SNPs,... and how these all will affect the protein.

        That's it..

        Thanks

        Comment


        • #5
          Originally posted by jjohnson View Post
          If you are new to NGS analysis, I could also recommend using a commercial platform such as CLC or DNANexus for RNA-Seq analysis. CLC is a local workbench with yearly license fees (very reasonable for academic or non-profits), while Nexus is cloud based and a pay per GB model.

          Service providers are also an option as well to get consulting work done.

          If you have the informatics chops, DESEQ, tuxedo suite, and several other packages align and help you annotate and evaluate expression data.
          Hi jjohnson,

          Thanks for the reply.
          can these software give annotation to aligned reads? The information that I need is ORFs, SNPs, splice varients, ..etc

          Thanks again

          Comment


          • #6
            So, you want to know for every single one of your millions of read whether it sits on a SNP, an ORF, a splice variant etc, giving you a huge list with millions of lines. Are you sure you know whet you want to do next with that? There is a reason why this is not how it is usually done.

            Comment


            • #7
              Originally posted by Simon Anders View Post
              So, you want to know for every single one of your millions of read whether it sits on a SNP, an ORF, a splice variant etc, giving you a huge list with millions of lines. Are you sure you know whet you want to do next with that? There is a reason why this is not how it is usually done.
              Simon, thanks for joining the conversation.

              the main aim is to find differences among 4 species, each one has its own RNA reads aligned to the genome. these 4 species suffer different stress, I need to know the effect of these stresses on the expression. is that the right way? as I assume, by knowing from which gene this RNA was expressed, I will figure out which genes are involved in tolerating the stress.

              Thanks

              Comment


              • #8
                Sounds like what you actually want to do is look at differential expression between the groups. Search the forum for a likely plethora of threads on that subject.

                Comment


                • #9
                  Originally posted by mhadidi2002 View Post
                  Simon, thanks for joining the conversation.

                  the main aim is to find differences among 4 species, each one has its own RNA reads aligned to the genome. these 4 species suffer different stress, I need to know the effect of these stresses on the expression. is that the right way? as I assume, by knowing from which gene this RNA was expressed, I will figure out which genes are involved in tolerating the stress.

                  Thanks
                  Could you give some more details about the species and how you did the alignments? You say you aligned them to the genome. Do all four have a sequenced genome or was there just a single genome you aligned them to? Also do you know if there are annotation files for this/these genomes, like a gff or gtf file?

                  I have done a similar experiment with 3 different plant species, for which we did not have an annotated genome. To get at the question of differential expression, we aligned to the sequenced genome of a closely related species and then used that as our reference.

                  Comment


                  • #10
                    Originally posted by chadn737 View Post
                    Could you give some more details about the species and how you did the alignments? You say you aligned them to the genome. Do all four have a sequenced genome or was there just a single genome you aligned them to? Also do you know if there are annotation files for this/these genomes, like a gff or gtf file?

                    I have done a similar experiment with 3 different plant species, for which we did not have an annotated genome. To get at the question of differential expression, we aligned to the sequenced genome of a closely related species and then used that as our reference.
                    Hello Chadn737,

                    My work is similar to yours. I have 4 species for the same plant, their genomes aren't sequenced, but there is a sequence for 1 closely related plant, which is considered as the reference.

                    I aligned the 4 different species which the closely related species. the output is in bed and bam files. I need to make a reannonation for that reference genome, depending on these 4 species. I have annotation for the reference genome, but in GFF format.

                    do u any idea?

                    Thanks for the discussion.

                    Comment


                    • #11
                      Originally posted by mhadidi2002 View Post
                      Hello Chadn737,

                      My work is similar to yours. I have 4 species for the same plant, their genomes aren't sequenced, but there is a sequence for 1 closely related plant, which is considered as the reference.

                      I aligned the 4 different species which the closely related species. the output is in bed and bam files. I need to make a reannonation for that reference genome, depending on these 4 species. I have annotation for the reference genome, but in GFF format.

                      do u any idea?

                      Thanks for the discussion.
                      This is exactly the situation we have worked with.

                      You're problem of "annotation" is fairly straight forward. What you are really wanting to do is get a list of gene names and the number of reads mapping to each for downstream analysis like differential expression, right?

                      There are a couple of options to do this, the approach I prefer is to use htseq-count which was written by Simon Anders. But there are other approaches using bedtools and the like.

                      If you don't mind me asking, what genome did you align to? If it is Arabidopsis, Maize, or another well annotated plant species, then there will be a lot more tools available for downstream analysis once you find your differentially expressed genes.

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM
                      • seqadmin
                        Strategies for Sequencing Challenging Samples
                        by seqadmin


                        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                        03-22-2024, 06:39 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      22 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      24 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 09:21 AM
                      0 responses
                      20 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-04-2024, 09:00 AM
                      0 responses
                      52 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X