Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Methods for comparing RNA sequencing libraries?

    Hi everyone,

    I am going to explain briefly what I'd like to do. I want to see which genes are differentially expressed between plants sterile and fertile. To do so, I will make a massive sequencing (454) of RNA from anthers from each line (fertile and sterile). My question is: what is the protocol to follow to compare the two libraries (fertile and sterile). So, which software I need to use and the steps to follow.

    Thanks very much for your attention

  • #2
    I'm sure others will comment, but first, if your main interest is differential expression, 454 seems to me to be the wrong technology. A couple hundred thousand reads will likely not give you the resolution necessary...unless you're planning on making concatenated SAGE libraries.

    Comment


    • #3
      Thanks ECO,

      Which method could you advise to me for a high resolution differential expression?. Could it work If I do a subtractive hybridization in both direction and then massive sequencing of the product?

      Comment


      • #4
        Solexa might be ..
        See if this is of any use:

        BioTechniques mappability article



        RNA-seq (U of Chicago)
        An international, peer-reviewed genome sciences journal featuring outstanding original research that offers novel insights into the biology of all organisms
        --
        bioinfosm

        Comment


        • #5
          Illumina sequencing is preferred for differential expression analysis IF you have a reference genome or transcriptome to map your reads to. I have had to explain to a few P.I.s that we really can't do expression analysis using the Illumina on their favorite non-model plant when the sum total of transcriptome information is the 500 EST sequences they generated 10 years ago.

          If you have reference and you have access to an Illumina then go with that, using the RNA-seq method referenced above by bioinfosm. You could try the Illumina DGEx method, but this is a lot of work both in the sample prep and data analysis. It also requires a very well annotated transcriptome to make the correct assignments.

          A consideration in plants though is that they may contain large families of very closely related genes. Short reads from the Illumina may not be able to be unambiguously assigned to single family member. (Of course this is only in issue of you are interested in examining differential expression within a gene family.) In this case the longer reads from the 454 may be more useful, in particular when the sequencing is targeted at the 3'-UTR (http://www.plantphysiol.org/cgi/content/full/146/1/32). This may also be protocol to consider if you only have access to a 454 instrument.

          You did not say what species of plant you are working with but if it a completely novel species then all of the above protocols would be hindered by lack of a reference. In situations like this we have done whole transcriptome shotgun sequencing from multiple libraries (e.g. different tissues, developmental stages, mutant vs. wt, etc.) and then assembled putative transcripts from the reads. For putative transcripts with a sufficient number of reads assigned you can get some differential expression information. Putative gene product IDs for the the transcript assemblies may be assigned using BLAST. This method really only provides expression information for moderately to highly expressed genes.

          Comment


          • #6
            Thanks to bioinfosm and kmcarr,

            I work with wheat (hexaploid wheat), i think that it information is relevant for suggest the best method to do a differential expression. Although there is some information about the transcriptome (1034368 ESTs), I don't if it is enoguh.

            On the other hand, i only have access to a 454 technology.

            Is it possible to estimate the coverage of a 454 library?

            Thanks again

            Comment


            • #7
              Just a quick note,

              I have some software that I've been using to do this type of analysis based off of alignments to the transcriptome using the exonerate aligner and Solexa sequences (It was about a year ago, so it was still Solexa back then, and exonerate was pretty much "state of the art" for a while...). It generates an excel format output that can be used to compare gene expression, and is relatively easy to do statistics on.

              I imagine that the same process could be used with 454 (albeit, the resolving power and the statistics would be dramatically limited compared to Illumina reads simply because of the sampling depth - 4 to 8M reads per lane for Illumina, and I'm not sure what you'd get with 454).

              The issue, however, is that it sounds like you don't have a transcript database/fasta/etc for wheat that can be used for aligning against (exactly as kmcarr pointed out), which means that doing anything with high through genomics will be nearly impossible.

              With those two hurdles, it might make more sense to try to use your 454 machine to try assembling a transcriptome reference as your first step, before trying to do comparative genomics. You may not get the depth you need (I don't know how deep transcriptome sampling you can do with 454 at this point), but you'd definitely obtain a reference that would allow you to start working on this problem.

              Hopefully my comments aren't completely off-base!

              Anthony
              The more you know, the more you know you don't know. —Aristotle

              Comment


              • #8
                Anthony, your basic idea is sound and it is what we have done a number of times with non-model plants. However, as melano mentioned, there are already > 1 million EST sequences for T. aestivum which should provide adequate coverage of the transcriptome. And the work of assembling a putative transcriptome has already been done by JVCI (nee TIGR). Check out their Plant Transcript Assemblies site (http://plantta.tigr.org/). Melano, you can download the wheat assemblies at ftp://ftp.tigr.org/pub/data/plantta/Triticum_aestivum. The wheat assembly was done a little over two years ago when there were ~840,000 ESTs (plust a few fl-cDNA and mRNAs) but I don't think the additional ESTs would make a significant difference. The last release contains ~62,000 assemblies plus ~350,000 singleton ESTs.

                Note that the assemblies are simply shotgun assemblies of ESTs. There is no attempt to identify ORFs; the assemblies are error prone, including mis-calls and indels; some assemblies are chimeric, and there is a significant amount of redundancy in the data set (i.e. multiple assemblies apparently representing the same transcript.) Given all of that, at least they give you something to align to. The assemblies are annotated by BLAST vs the UniRef database.

                Melano, as I described above, you can either do shotgun cDNA sequencing or use the targeted 3'-UTR approach. The advantage if the 3'-UTR approach is that each "read" you generate will be a "count" of transcript in your sample; whereas with the shotgun sequencing you could be generating multiple reads from a single transcript which does not provide you with any additional information. The 3'-UTR sequencing may also be better at distinguishing closely related transcripts. To identify a read its sequence must be represented in your reference set. The shotgun sequencing approach provides a better chance of identifying reads (you're not limiting the sequence you gather to one part of the transcript). Given the size of the EST set though I think the 3'-UTRs should be well represented.

                You mentioned doing a subtractive hybridization above; what did you mean be this? Normally one would never do any sort of normalization on a sample to be used in a differential expression experiment. However if you know that there are certain transcripts which a) are very highly expressed and b) you are certain that you don't care about them, then it may be o.k. to try to remove them because they will "waste" a a lot of read capacity. I don't know about anthers but in an experiment we did with arabidopsis leaves we found that >50% of the reads were from the 10 most abundant transcripts (all photosynthesis related genes obviously). It really hurts to have >50% of your data be essentially worthless.

                Once you have your reads you can align them to the TIGR-TA reference using your favorite aligner (exonerate, BLAT, megablast) and count reads. Complications will be reads aligning equally well to more than one assembly and, if you use the shotgun approach, you will have to normalize the counts to the cDNA length (of course you don't actually know the true cDNA length for most of the transcripts.) 454 will generate no where near as many reads as Illumina, ~ 300,000 - 400,000 for a whole picotiter plate vs. ~ 32,000,000 - 48,000,000 for a whole flow cell. For moderately to highly expressed genes you should be able to measure differential expression with some degree of confidence. For genes with very low levels of expression, or if the difference in expression between your samples is small you may not be able to make statistically confident determinations.

                I hope this is enough information to get you started.

                Kevin
                Last edited by kmcarr; 09-03-2008, 10:42 AM. Reason: Clarity

                Comment


                • #9
                  Originally posted by kmcarr View Post
                  Note that the assemblies are simply shotgun assemblies of ESTs. There is no attempt to identify ORFs; the assemblies are error prone, including mis-calls and indels; some assemblies are chimeric, and there is a significant amount of redundancy in the data set (i.e. multiple assemblies apparently representing the same transcript.) Given all of that, at least they give you something to align to. The assemblies are annotated by BLAST vs the UniRef database.
                  Basically, we're saying the same thing, and using the same methods - with the single difference that I'm suggesting it might be worth building a transcriptome which isn't error-prone and would have more confidence than a trancriptome built from ESTs.

                  Anyhow, for almost all 2nd gen sequencing, you either assemble your own reference, or you use a reference alignment - and the quality of the reference is a major factor in the success of your experiment, thus my suggestion was simply a way to bootstrap using the available tools. As long as melano is stuck using 454, he's not going to get the depth he needs for this to work, so this thread may be a moot point anyhow.

                  Anthony
                  The more you know, the more you know you don't know. —Aristotle

                  Comment


                  • #10
                    I don't think reference will be a problem here. There's a large EST database derived from different wheat genotype available. There's no need to do transcriptome sequencing again. The transcriptome assembled using the EST databases should be good enough as long as you can map transcripts. After all, you're not looking for SNPs so you don't need to have error-free reference transcriptome.

                    Solexa is obviously the better choice. You can always send your samples to Solexa service provider. Try to find the best deal around.

                    By estimating the size of your transcriptome, you can calculate the coverage. coverage = Total amount of data generated (Mb)/ transcriptome size (Mb)

                    Anyway, hexaploid wheat is difficult to work with. But the advantage of working with a major crop is that there's extensive study on ESTs because genome sequencing is impossible. I hope you have a very good protocol to isolate RNA from anther. Best of luck.

                    Originally posted by melano View Post
                    Thanks to bioinfosm and kmcarr,

                    I work with wheat (hexaploid wheat), i think that it information is relevant for suggest the best method to do a differential expression. Although there is some information about the transcriptome (1034368 ESTs), I don't if it is enoguh.

                    On the other hand, i only have access to a 454 technology.

                    Is it possible to estimate the coverage of a 454 library?

                    Thanks again

                    Comment


                    • #11
                      DeepSAGE using 454 platform

                      Hi Melano,

                      If 454 is the platform you will be doing this on, you may wish to explore the DeepSAGE approach for linking tags before pyrosequencing. This way you can increase sampling depth.
                      The approach is described in Nielsen et al. 2006. DeepSAGE--digital transcriptomics with high sensitivity, simple experimental protocol and multiplexing of samples.

                      Best of luck on your research.

                      Roald, CLC bio

                      Comment


                      • #12
                        Thanks for the additional information - I've never worked with wheat before, so I didn't know what resources were available. Hexaploidy does sound like a challenge though. I'm looking forward to hearing (reading) how this turns out.
                        The more you know, the more you know you don't know. —Aristotle

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Current Approaches to Protein Sequencing
                          by seqadmin


                          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                          04-04-2024, 04:25 PM
                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 04-11-2024, 12:08 PM
                        0 responses
                        18 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 10:19 PM
                        0 responses
                        22 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 09:21 AM
                        0 responses
                        17 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-04-2024, 09:00 AM
                        0 responses
                        49 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X