Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • the downstream analysis of RNA-seq

    Hi all, I am not sure if it is quite appropriate to ask the question here, but I really appreciate it if anyone can give me some suggestions or comments here.
    As we all know, by mapping back the reads from RNA-seq back to the reference genome and counting the number of reads that fall in the region of a gene of interest, we can roughly estimate the gene expression level by the definition of RPKM, which means the number of reads per kilobase per million mapped reads. On the other hand, we know that gene expression is to make the RNA copies from DNA which contains the information for functional product such as protein. The number of RNA copies of a gene of interest may relate to the amount of the product (e.g., protein). Namely, more RNA copies, more associated protein. (If not, why we care about the gene expression at the RNA level?) However, I am not sure if the relation between the number of RNA copies and the protein amount is linear or not. The linear relationship is to say, 10 copies and 11 copies of RNA of a gene will make the 10 units and 11 units of protein (or proportionally), respectively. However, I think in the real world of living things, the manufacturing of the final product would be more robust if the RNA copies, regarding as the mid-product, get saturated. I mean that the functional product may not so sensitive to the exact number of RNA copies (otherwise, the cells need study to count everything). I am wondering whether at most cases, the RNA of a gene is saturated. So it would make no sense to count the exact number of RNA copies, and then to compare the numbers between samples in a precise way. Some statistical test method such as Fisher's exact test has more power when the numbers getting bigger, and however in the other hand the bigger numbers make the mid-product easier be saturated. In the microarray era, the fold change measurement is regarded to be the best to identify gene expression difference. As RNA-seq is becoming widely used, it is commonly thought that RNA-seq can measure the gene expression level digitally, and the fold change measure for gene expression difference may not be the best. I am arguing here that we should also use the fold change even on the RNA-seq data. The systems of lives would not be that exact.
    Ok, I wrote a lot here, thanks for reading. As I don’t have a biology background, my view can be incorrect (please help me to correct). Any comments are welcome
    Xi
    Xi Wang

  • #2
    Hi Xi,

    It is often the case that RNA levels do not relate to protein levels. There are various different ways in which cells control the translation rate of mRNA into protein. These include mRNA degradation and translation initiation. There are several reasons we still want this information however. Firstly we can tell if the gene is on at all, secondly it is difficult to determine protein levels (and even protein levels won't necessarily relate to function directly due to protein modifications), thirdly we might want to know the mRNA level for various reasons.

    The mRNA level itself is useful for the full understanding of gene expression control i.e. how the mRNA level relates to the protein level and to function even if the relationship is complicated. If considering functional RNAs (e.g. microRNAs) then there is no protein.

    I'm not sure what you mean by saturation here.

    Hope this was of some help.

    Adam

    Comment


    • #3
      Hi Adam, thanks for you reply!

      I agree with you that the relationship between RNA expression and protein level product is quite complicated. Some others also told me that the relationship is unclear now, and however it is the way to study diseases/cancers and might give hints to the answers. Like association study which tries to dig the information between DNA SNPs (and/or CNVs) with kinds of phenotype, little is known how different genotypes cause different phenotypes, but at the current stage people still pay lots of effort on these studies.

      According what you said, is it true that the on or off of a gene at the mRNA level is more important than the relative amount of gene expression? If a gene is off, we can definitely claim that there is no protein product of that gene. Right? Otherwise, if a gene is on, it could be difficult to determine the amount of the protein product, because of the feedback mechanism or alternative pathway. Right?

      Another thing I am concerning is that the differentially expressed genes identified are always statistically significant, but not biologically significant. If we can classify the genes which function along and which function with other genes (and/or other classes), we could use different statistic significance level to determine the differentially expression genes for the distinct classes.

      The saturation of gene expression might be understood in this way: there is a certain threshold of the gene expression at the mRNA level for each gene, and if the mRNA amount of a gene is beyond the threshold, the mRNA expression is regarded as saturated, and the protein product would keep at a stable level. For example, if the threshold of a gene is 10 copies of mRNA, the 15 and the 30 copies of mRNA will result in the same protein product, but the fold change at the mRNA level is 2. Considering the cases where the mRNA amount is under the threshold, maybe the linear or linear-like relationship takes place.

      Because of the complexity of the relationship between the gene expression at the RNA level and the final function of genes, it would be not easy to clean up all obstacles on the research road to get everything clear. However, we always do the things that we think to be logical and correct now, and the risk exists all around.

      Any comment is welcomed. Thanks.
      Xi Wang

      Comment


      • #4
        If i understand well, what you call "saturation of gene expression" is the limitation of the capacity to translate the mRNAs beyond a given level. This is a point of view, but one could also consider that for a protein to be active, a post-translational modification is required (eg phosphorylation) and that this is the limiting step that "saturates the gene expression".
        Anyway, even if there is such a specific "saturation" level, the problem is to characterize it, because like transcription, translation is highly regulated and thus depends on several factors.
        As you point out, RPKM values from RNA-seq experiments just give an estimate of the quantity of transcripts, which may indeed not correlate with the biological activity of the corresponding protein (if any). Systematically assuming such a correlation is therefore incorrect, as we all should already know.
        Just one more thing: personally i don't believe in a binary transcription scheme, with the idea that a gene can be "on" or "off". It seems that almost the entire genome is pervasively transcribed -only the intensity varies.

        Comment


        • #5
          Thanks, steven. I posted my points below.

          If i understand well, what you call "saturation of gene expression" is the limitation of the capacity to translate the mRNAs beyond a given level. This is a point of view, but one could also consider that for a protein to be active, a post-translational modification is required (eg phosphorylation) and that this is the limiting step that "saturates the gene expression".
          Yes, this is another example implies that the mRNA level affects the gene function little. So the differentially expressed genes identified from the case-control studies would be the contributors. However, when we consider this problem in a reverse manner, is the proposition right: if a disease-related gene loss its function in case samples by not producing any protein product, how is the mRNA expression? I think everyone would give the answer that it’s hard to say, because of translation regulation, post-translation modification, and other factors.
          Anyway, even if there is such a specific "saturation" level, the problem is to characterize it, because like transcription, translation is highly regulated and thus depends on several factors.
          As you point out, RPKM values from RNA-seq experiments just give an estimate of the quantity of transcripts, which may indeed not correlate with the biological activity of the corresponding protein (if any). Systematically assuming such a correlation is therefore incorrect, as we all should already know.
          I agree with your points. The question remains why we pay so much effort, money and time to study the gene expression difference at the mRNA level. What is the probability that a differentially expressed gene is the real crucial contributor to the certain disease or cancer?

          Just one more thing: personally i don't believe in a binary transcription scheme, with the idea that a gene can be "on" or "off". It seems that almost the entire genome is pervasively transcribed -only the intensity varies.
          Yes, the new experimental conclusion implies this point. I think one transcript may not only have a sole function, but the all the transcripts work together to carry out various functions. It is a complex system, there existing communications, collaborations and coordination, which makes the living more stable and adaptable.
          Last edited by Xi Wang; 01-12-2010, 09:23 PM.
          Xi Wang

          Comment


          • #6
            Originally posted by Xi Wang View Post
            The question remains why we pay so much effort, money and time to study the gene expression difference at the mRNA level.
            Ha ha ha, because it's fun!

            Comment


            • #7
              Originally posted by steven View Post
              Ha ha ha, because it's fun!
              Haha, it is quite a good answer.

              Back to several years ago, at the beginning of the human genome project, it was believed that after the human genome map completed, everything could be solved by decoding gene codes. However, it is turned out that all the things become more complicated. So people go on to research and research...
              Xi Wang

              Comment


              • #8
                That's why it's called research, not just search!

                Comment


                • #9
                  Originally posted by Xi Wang View Post
                  Thanks, steven. I posted my points below.

                  I agree with your points. The question remains why we pay so much effort, money and time to study the gene expression difference at the mRNA level. What is the probability that a differentially expressed gene is the real crucial contributor to the certain disease or cancer?
                  The first answer to this is because we can. Global mRNA profiling is possible; global protein profiling really isn't at this time. Certainly not on the scale one would like. Yes, it is a bit of "drunk looking under the light post for his keys".

                  The second answer is that it is often the case that RNA expression corresponds to protein expression which corresponds to protein functionality. High expression of a gene in cancer, particularly when seen in multiple samples, is often (but not always) a useful clue that the gene is important. There are quite a few good examples of important genes (or pathways) in cancer and other diseases being found through expression profiling.

                  The third answer is that for some applications, such as diagnostics and pharmacodynamic markers, whether or not the expression is biologically relevant is actually of secondary (or less) importance. So long as you can find a reproducible pattern that has predictive power, you've accomplished something important.

                  Comment


                  • #10
                    Thanks, krobison!

                    You are quite right. Researchers always do what they can do to reveal some important, may not be the most important, rule, mechanism, relationship and so on. As you said, classification or other methods are really powerful to predict outcome, but most of those methods encountered the biological explanation problem: why the genes can predict the outcome? It is back the mechanism research on the relationship between RNA expression and protein expression.
                    Anyhow, I see the importance of the current study. Generally, two groups of people are there working hard: one is at a high level, ignoring the detailed mechanism, just working on the relationship between genes and diseases; the other is working on the other side for mechanism research. One day, when the groups of people come to a joint point, most of questions will be clear.
                    Xi Wang

                    Comment


                    • #11
                      Originally posted by Xi Wang View Post
                      However, it is turned out that all the things become more complicated.
                      As our island of knowledge grows, so does the shore of our ignorance.
                      (John Wheeler)

                      Comment


                      • #12
                        Hi,
                        upon searching for answers i came across your post of 11-27-2009 regarding the saturation of gene expression. I have a question regarding what I beleive is the same topic. I have scanned the gene expression of 49 subjects. Results showed a normal distribution so i took the 6 upper and 6 lower extreme subjects to perform a pharmacokinetic study on. The average difference in gene expression between the two groups was about 600 fold. No differences were seen in the pharmacokinetics. Only the 2 subjects showing the highest gene expression (2 fold higher to the average of the high expressing group and 1200 fold higher than the average of the low expressing group) showed significant differences in pharmacokinetic behaviour.
                        Could it be that only this extremely high mRNA expression could lead to proteib expression differences?
                        I thought maybe a threshold mRNA expression might have to be overcome in order to result in protein expression differences.

                        I hope I could make my question understandable,

                        thanking you in advance

                        Thomas

                        Comment


                        • #13
                          Originally posted by samoth View Post
                          Hi,
                          upon searching for answers i came across your post of 11-27-2009 regarding the saturation of gene expression. I have a question regarding what I beleive is the same topic. I have scanned the gene expression of 49 subjects. Results showed a normal distribution so i took the 6 upper and 6 lower extreme subjects to perform a pharmacokinetic study on. The average difference in gene expression between the two groups was about 600 fold. No differences were seen in the pharmacokinetics. Only the 2 subjects showing the highest gene expression (2 fold higher to the average of the high expressing group and 1200 fold higher than the average of the low expressing group) showed significant differences in pharmacokinetic behaviour.
                          Could it be that only this extremely high mRNA expression could lead to proteib expression differences?
                          I thought maybe a threshold mRNA expression might have to be overcome in order to result in protein expression differences.

                          I hope I could make my question understandable,

                          thanking you in advance

                          Thomas
                          Thanks for your question. I am sorry that I didn't understand your idea by "subject". Do you mean genes or samples? And what shows a normal distribution?

                          Don't you think the other 4 are your novel findings?
                          Xi Wang

                          Comment


                          • #14
                            Hi Xi,
                            thanks for your reply. By subjects i mean animals. By normal distribution i mean a normal gaussian distribution.
                            thanks

                            Comment


                            • #15
                              Originally posted by samoth View Post
                              Hi Xi,
                              thanks for your reply. By subjects i mean animals. By normal distribution i mean a normal gaussian distribution.
                              thanks
                              So you scanned the gene expression for 49 animals, only one gene. And the one gene's expression level in the 49 animals are normally distributed. Then, you picked up the extreme animals to check their phenotype. Is my understanding right?

                              My concerns are: 1) is the gene you are interested in the only gene related to the phenotype? 2) how about the ones with extremely low expression levels? 3) how did you quantify the expression levels?
                              Xi Wang

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              18 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              22 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              17 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              48 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X