SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing
Similar Threads
Thread Thread Starter Forum Replies Last Post
Downstream Cuffdiff analysis apadr007 Bioinformatics 2 11-23-2011 12:12 PM
RNA-Seq: ReCount: A multi-experiment resource of analysis-ready RNA-seq gene count da Newsbot! Literature Watch 0 11-18-2011 02:20 AM
454 downstream analysis. aloliveira Bioinformatics 5 11-16-2011 06:10 AM
RNA-Seq: Comparative Analysis of RNA-Seq Alignment Algorithms and the RNA-Seq Unified Newsbot! Literature Watch 3 07-31-2011 07:08 PM
unique reads for downstream analysis bioinfosm Bioinformatics 3 07-07-2009 01:30 PM

Reply
 
Thread Tools
Old 11-27-2009, 04:43 AM   #1
Xi Wang
Senior Member
 
Location: MDC, Berlin, Germany

Join Date: Oct 2009
Posts: 317
Default the downstream analysis of RNA-seq

Hi all, I am not sure if it is quite appropriate to ask the question here, but I really appreciate it if anyone can give me some suggestions or comments here.
As we all know, by mapping back the reads from RNA-seq back to the reference genome and counting the number of reads that fall in the region of a gene of interest, we can roughly estimate the gene expression level by the definition of RPKM, which means the number of reads per kilobase per million mapped reads. On the other hand, we know that gene expression is to make the RNA copies from DNA which contains the information for functional product such as protein. The number of RNA copies of a gene of interest may relate to the amount of the product (e.g., protein). Namely, more RNA copies, more associated protein. (If not, why we care about the gene expression at the RNA level?) However, I am not sure if the relation between the number of RNA copies and the protein amount is linear or not. The linear relationship is to say, 10 copies and 11 copies of RNA of a gene will make the 10 units and 11 units of protein (or proportionally), respectively. However, I think in the real world of living things, the manufacturing of the final product would be more robust if the RNA copies, regarding as the mid-product, get saturated. I mean that the functional product may not so sensitive to the exact number of RNA copies (otherwise, the cells need study to count everything). I am wondering whether at most cases, the RNA of a gene is saturated. So it would make no sense to count the exact number of RNA copies, and then to compare the numbers between samples in a precise way. Some statistical test method such as Fisher's exact test has more power when the numbers getting bigger, and however in the other hand the bigger numbers make the mid-product easier be saturated. In the microarray era, the fold change measurement is regarded to be the best to identify gene expression difference. As RNA-seq is becoming widely used, it is commonly thought that RNA-seq can measure the gene expression level digitally, and the fold change measure for gene expression difference may not be the best. I am arguing here that we should also use the fold change even on the RNA-seq data. The systems of lives would not be that exact.
Ok, I wrote a lot here, thanks for reading. As I donít have a biology background, my view can be incorrect (please help me to correct). Any comments are welcome
Xi
__________________
Xi Wang
Xi Wang is offline   Reply With Quote
Old 01-08-2010, 04:07 AM   #2
adamreid
Junior Member
 
Location: Cambridge, UK

Join Date: Nov 2009
Posts: 6
Default

Hi Xi,

It is often the case that RNA levels do not relate to protein levels. There are various different ways in which cells control the translation rate of mRNA into protein. These include mRNA degradation and translation initiation. There are several reasons we still want this information however. Firstly we can tell if the gene is on at all, secondly it is difficult to determine protein levels (and even protein levels won't necessarily relate to function directly due to protein modifications), thirdly we might want to know the mRNA level for various reasons.

The mRNA level itself is useful for the full understanding of gene expression control i.e. how the mRNA level relates to the protein level and to function even if the relationship is complicated. If considering functional RNAs (e.g. microRNAs) then there is no protein.

I'm not sure what you mean by saturation here.

Hope this was of some help.

Adam
adamreid is offline   Reply With Quote
Old 01-09-2010, 09:33 PM   #3
Xi Wang
Senior Member
 
Location: MDC, Berlin, Germany

Join Date: Oct 2009
Posts: 317
Default

Hi Adam, thanks for you reply!

I agree with you that the relationship between RNA expression and protein level product is quite complicated. Some others also told me that the relationship is unclear now, and however it is the way to study diseases/cancers and might give hints to the answers. Like association study which tries to dig the information between DNA SNPs (and/or CNVs) with kinds of phenotype, little is known how different genotypes cause different phenotypes, but at the current stage people still pay lots of effort on these studies.

According what you said, is it true that the on or off of a gene at the mRNA level is more important than the relative amount of gene expression? If a gene is off, we can definitely claim that there is no protein product of that gene. Right? Otherwise, if a gene is on, it could be difficult to determine the amount of the protein product, because of the feedback mechanism or alternative pathway. Right?

Another thing I am concerning is that the differentially expressed genes identified are always statistically significant, but not biologically significant. If we can classify the genes which function along and which function with other genes (and/or other classes), we could use different statistic significance level to determine the differentially expression genes for the distinct classes.

The saturation of gene expression might be understood in this way: there is a certain threshold of the gene expression at the mRNA level for each gene, and if the mRNA amount of a gene is beyond the threshold, the mRNA expression is regarded as saturated, and the protein product would keep at a stable level. For example, if the threshold of a gene is 10 copies of mRNA, the 15 and the 30 copies of mRNA will result in the same protein product, but the fold change at the mRNA level is 2. Considering the cases where the mRNA amount is under the threshold, maybe the linear or linear-like relationship takes place.

Because of the complexity of the relationship between the gene expression at the RNA level and the final function of genes, it would be not easy to clean up all obstacles on the research road to get everything clear. However, we always do the things that we think to be logical and correct now, and the risk exists all around.

Any comment is welcomed. Thanks.
__________________
Xi Wang
Xi Wang is offline   Reply With Quote
Old 01-11-2010, 02:13 AM   #4
steven
Senior Member
 
Location: Southern France

Join Date: Aug 2009
Posts: 269
Default

If i understand well, what you call "saturation of gene expression" is the limitation of the capacity to translate the mRNAs beyond a given level. This is a point of view, but one could also consider that for a protein to be active, a post-translational modification is required (eg phosphorylation) and that this is the limiting step that "saturates the gene expression".
Anyway, even if there is such a specific "saturation" level, the problem is to characterize it, because like transcription, translation is highly regulated and thus depends on several factors.
As you point out, RPKM values from RNA-seq experiments just give an estimate of the quantity of transcripts, which may indeed not correlate with the biological activity of the corresponding protein (if any). Systematically assuming such a correlation is therefore incorrect, as we all should already know.
Just one more thing: personally i don't believe in a binary transcription scheme, with the idea that a gene can be "on" or "off". It seems that almost the entire genome is pervasively transcribed -only the intensity varies.
steven is offline   Reply With Quote
Old 01-12-2010, 08:20 PM   #5
Xi Wang
Senior Member
 
Location: MDC, Berlin, Germany

Join Date: Oct 2009
Posts: 317
Default

Thanks, steven. I posted my points below.

Quote:
If i understand well, what you call "saturation of gene expression" is the limitation of the capacity to translate the mRNAs beyond a given level. This is a point of view, but one could also consider that for a protein to be active, a post-translational modification is required (eg phosphorylation) and that this is the limiting step that "saturates the gene expression".
Yes, this is another example implies that the mRNA level affects the gene function little. So the differentially expressed genes identified from the case-control studies would be the contributors. However, when we consider this problem in a reverse manner, is the proposition right: if a disease-related gene loss its function in case samples by not producing any protein product, how is the mRNA expression? I think everyone would give the answer that it’s hard to say, because of translation regulation, post-translation modification, and other factors.
Quote:
Anyway, even if there is such a specific "saturation" level, the problem is to characterize it, because like transcription, translation is highly regulated and thus depends on several factors.
As you point out, RPKM values from RNA-seq experiments just give an estimate of the quantity of transcripts, which may indeed not correlate with the biological activity of the corresponding protein (if any). Systematically assuming such a correlation is therefore incorrect, as we all should already know.
I agree with your points. The question remains why we pay so much effort, money and time to study the gene expression difference at the mRNA level. What is the probability that a differentially expressed gene is the real crucial contributor to the certain disease or cancer?

Quote:
Just one more thing: personally i don't believe in a binary transcription scheme, with the idea that a gene can be "on" or "off". It seems that almost the entire genome is pervasively transcribed -only the intensity varies.
Yes, the new experimental conclusion implies this point. I think one transcript may not only have a sole function, but the all the transcripts work together to carry out various functions. It is a complex system, there existing communications, collaborations and coordination, which makes the living more stable and adaptable.
__________________
Xi Wang

Last edited by Xi Wang; 01-12-2010 at 08:23 PM.
Xi Wang is offline   Reply With Quote
Old 01-13-2010, 01:40 AM   #6
steven
Senior Member
 
Location: Southern France

Join Date: Aug 2009
Posts: 269
Default

Quote:
Originally Posted by Xi Wang View Post
The question remains why we pay so much effort, money and time to study the gene expression difference at the mRNA level.
Ha ha ha, because it's fun!
steven is offline   Reply With Quote
Old 01-13-2010, 06:59 AM   #7
Xi Wang
Senior Member
 
Location: MDC, Berlin, Germany

Join Date: Oct 2009
Posts: 317
Default

Quote:
Originally Posted by steven View Post
Ha ha ha, because it's fun!
Haha, it is quite a good answer.

Back to several years ago, at the beginning of the human genome project, it was believed that after the human genome map completed, everything could be solved by decoding gene codes. However, it is turned out that all the things become more complicated. So people go on to research and research...
__________________
Xi Wang
Xi Wang is offline   Reply With Quote
Old 01-13-2010, 08:20 AM   #8
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,178
Default

That's why it's called research, not just search!
kmcarr is offline   Reply With Quote
Old 01-13-2010, 10:04 AM   #9
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default

Quote:
Originally Posted by Xi Wang View Post
Thanks, steven. I posted my points below.

I agree with your points. The question remains why we pay so much effort, money and time to study the gene expression difference at the mRNA level. What is the probability that a differentially expressed gene is the real crucial contributor to the certain disease or cancer?
The first answer to this is because we can. Global mRNA profiling is possible; global protein profiling really isn't at this time. Certainly not on the scale one would like. Yes, it is a bit of "drunk looking under the light post for his keys".

The second answer is that it is often the case that RNA expression corresponds to protein expression which corresponds to protein functionality. High expression of a gene in cancer, particularly when seen in multiple samples, is often (but not always) a useful clue that the gene is important. There are quite a few good examples of important genes (or pathways) in cancer and other diseases being found through expression profiling.

The third answer is that for some applications, such as diagnostics and pharmacodynamic markers, whether or not the expression is biologically relevant is actually of secondary (or less) importance. So long as you can find a reproducible pattern that has predictive power, you've accomplished something important.
krobison is offline   Reply With Quote
Old 01-13-2010, 06:42 PM   #10
Xi Wang
Senior Member
 
Location: MDC, Berlin, Germany

Join Date: Oct 2009
Posts: 317
Default

Thanks, krobison!

You are quite right. Researchers always do what they can do to reveal some important, may not be the most important, rule, mechanism, relationship and so on. As you said, classification or other methods are really powerful to predict outcome, but most of those methods encountered the biological explanation problem: why the genes can predict the outcome? It is back the mechanism research on the relationship between RNA expression and protein expression.
Anyhow, I see the importance of the current study. Generally, two groups of people are there working hard: one is at a high level, ignoring the detailed mechanism, just working on the relationship between genes and diseases; the other is working on the other side for mechanism research. One day, when the groups of people come to a joint point, most of questions will be clear.
__________________
Xi Wang
Xi Wang is offline   Reply With Quote
Old 01-14-2010, 12:51 AM   #11
steven
Senior Member
 
Location: Southern France

Join Date: Aug 2009
Posts: 269
Default

Quote:
Originally Posted by Xi Wang View Post
However, it is turned out that all the things become more complicated.
As our island of knowledge grows, so does the shore of our ignorance.
(John Wheeler)
steven is offline   Reply With Quote
Old 04-04-2011, 12:30 AM   #12
samoth
Junior Member
 
Location: europe

Join Date: Apr 2011
Posts: 3
Default

Hi,
upon searching for answers i came across your post of 11-27-2009 regarding the saturation of gene expression. I have a question regarding what I beleive is the same topic. I have scanned the gene expression of 49 subjects. Results showed a normal distribution so i took the 6 upper and 6 lower extreme subjects to perform a pharmacokinetic study on. The average difference in gene expression between the two groups was about 600 fold. No differences were seen in the pharmacokinetics. Only the 2 subjects showing the highest gene expression (2 fold higher to the average of the high expressing group and 1200 fold higher than the average of the low expressing group) showed significant differences in pharmacokinetic behaviour.
Could it be that only this extremely high mRNA expression could lead to proteib expression differences?
I thought maybe a threshold mRNA expression might have to be overcome in order to result in protein expression differences.

I hope I could make my question understandable,

thanking you in advance

Thomas
samoth is offline   Reply With Quote
Old 04-12-2011, 07:43 PM   #13
Xi Wang
Senior Member
 
Location: MDC, Berlin, Germany

Join Date: Oct 2009
Posts: 317
Default

Quote:
Originally Posted by samoth View Post
Hi,
upon searching for answers i came across your post of 11-27-2009 regarding the saturation of gene expression. I have a question regarding what I beleive is the same topic. I have scanned the gene expression of 49 subjects. Results showed a normal distribution so i took the 6 upper and 6 lower extreme subjects to perform a pharmacokinetic study on. The average difference in gene expression between the two groups was about 600 fold. No differences were seen in the pharmacokinetics. Only the 2 subjects showing the highest gene expression (2 fold higher to the average of the high expressing group and 1200 fold higher than the average of the low expressing group) showed significant differences in pharmacokinetic behaviour.
Could it be that only this extremely high mRNA expression could lead to proteib expression differences?
I thought maybe a threshold mRNA expression might have to be overcome in order to result in protein expression differences.

I hope I could make my question understandable,

thanking you in advance

Thomas
Thanks for your question. I am sorry that I didn't understand your idea by "subject". Do you mean genes or samples? And what shows a normal distribution?

Don't you think the other 4 are your novel findings?
__________________
Xi Wang
Xi Wang is offline   Reply With Quote
Old 04-12-2011, 11:26 PM   #14
samoth
Junior Member
 
Location: europe

Join Date: Apr 2011
Posts: 3
Default

Hi Xi,
thanks for your reply. By subjects i mean animals. By normal distribution i mean a normal gaussian distribution.
thanks
samoth is offline   Reply With Quote
Old 04-13-2011, 12:24 AM   #15
Xi Wang
Senior Member
 
Location: MDC, Berlin, Germany

Join Date: Oct 2009
Posts: 317
Default

Quote:
Originally Posted by samoth View Post
Hi Xi,
thanks for your reply. By subjects i mean animals. By normal distribution i mean a normal gaussian distribution.
thanks
So you scanned the gene expression for 49 animals, only one gene. And the one gene's expression level in the 49 animals are normally distributed. Then, you picked up the extreme animals to check their phenotype. Is my understanding right?

My concerns are: 1) is the gene you are interested in the only gene related to the phenotype? 2) how about the ones with extremely low expression levels? 3) how did you quantify the expression levels?
__________________
Xi Wang
Xi Wang is offline   Reply With Quote
Old 04-13-2011, 12:29 AM   #16
samoth
Junior Member
 
Location: europe

Join Date: Apr 2011
Posts: 3
Default

yes thats right.
The gene i investigated is not the sole gene related to the phenotype. but my project is to determine the effect of the expression of this one gene on the phenotype.
gene expression was carried out using real time qPCR measuring relative expression (using TaqMan MGB probes)
What do you mean with "2) how about the ones with extremely low expression levels? "
thanks
samoth is offline   Reply With Quote
Old 04-14-2011, 12:57 AM   #17
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Interesting discussion, and good idea to occasionally go back to the basics here on the forum.

A few more points:

If a cell want to change its protein inventory in reaction to a stimulus, it can do so by changing the production, the translation or the degradation of mRNA. So, if you want to know how the cell reacts to a stimulus, one would want to measure all three processes genome-wide in control and treatment samples and check where one sees a statistically significant change. Of course, we can only measure mRNA level (i.e., the combined effect of production and degradation) in a convenient way, so we may miss the cell's main response mechanism. However, if we find something, we can be sure that it is in fact a genuine response of the cell to the stimulus (even though we should not jump to the conclusion that it is the functional aspect of the response rather than a byproduct).

Just to be clear: By statistical significance, I mean that we can be confident that the change in expression level when comparing treatment with control is large enough that it is unlikely to be due to the random variation in expression that we also see when comparing replicates. To test for this, we need a test that uses replicates to estimate this within-group noise, and Fisher's exact test (mentioned in the original post) does not fulfill this criterion and hence is not suitable.

By biological significance, we mean that the observed effect is likely to be a functional part of the cell's response to the stimulus, as opposed to a mere byproduct, such as an unrelated downstream effect of the actual response. A reasonable indicator for this is in fact that the fold change is not too small, and a plausible veto might be that the change is unlikely to cause changes in protein abundance, even though I do not think that saturation of the translation machinery is a common effect. (Look at ribosomal proteins: Why would the cell produce such masses of mRNA, orders of magnitude more than average genes, if it hadn't the resources to translate them all.)

Finally: The mass-spec folks have made some amazing advances recently, and comparing mRNA and protein levels in at least a semi-genomewide fashion will be a big topic soon, I think. A limitation here will be that we need time courses for this. After all, we do not expect mRNA levels to be proportional to protein levels, but to be proportional to the derivative with respect to time of the protein level.

Last edited by Simon Anders; 04-14-2011 at 01:00 AM.
Simon Anders is offline   Reply With Quote
Old 04-14-2011, 09:42 PM   #18
Xi Wang
Senior Member
 
Location: MDC, Berlin, Germany

Join Date: Oct 2009
Posts: 317
Default

Quote:
Originally Posted by Simon Anders View Post
Interesting discussion, and good idea to occasionally go back to the basics here on the forum.

A few more points:

If a cell want to change its protein inventory in reaction to a stimulus, it can do so by changing the production, the translation or the degradation of mRNA. So, if you want to know how the cell reacts to a stimulus, one would want to measure all three processes genome-wide in control and treatment samples and check where one sees a statistically significant change. Of course, we can only measure mRNA level (i.e., the combined effect of production and degradation) in a convenient way, so we may miss the cell's main response mechanism. However, if we find something, we can be sure that it is in fact a genuine response of the cell to the stimulus (even though we should not jump to the conclusion that it is the functional aspect of the response rather than a byproduct).

Just to be clear: By statistical significance, I mean that we can be confident that the change in expression level when comparing treatment with control is large enough that it is unlikely to be due to the random variation in expression that we also see when comparing replicates. To test for this, we need a test that uses replicates to estimate this within-group noise, and Fisher's exact test (mentioned in the original post) does not fulfill this criterion and hence is not suitable.

By biological significance, we mean that the observed effect is likely to be a functional part of the cell's response to the stimulus, as opposed to a mere byproduct, such as an unrelated downstream effect of the actual response. A reasonable indicator for this is in fact that the fold change is not too small, and a plausible veto might be that the change is unlikely to cause changes in protein abundance, even though I do not think that saturation of the translation machinery is a common effect. (Look at ribosomal proteins: Why would the cell produce such masses of mRNA, orders of magnitude more than average genes, if it hadn't the resources to translate them all.)

Finally: The mass-spec folks have made some amazing advances recently, and comparing mRNA and protein levels in at least a semi-genomewide fashion will be a big topic soon, I think. A limitation here will be that we need time courses for this. After all, we do not expect mRNA levels to be proportional to protein levels, but to be proportional to the derivative with respect to time of the protein level.
I totally agree with Simon's points. It is more important to go back to the original biological questions when analyzing biological data.

In my mind, once people can image all the processes that occur in cells between molecules, how DNA goes into protein products will be clear. However, nowadays, due to the limitation of current technologies, we can only measure gene expression levels at the RNA level and seldom at the protein level, which is far from the whole picture. The lack of knowledge makes inferences very noisy, and many discoveries are not believable. The combination of RNA-seq and mass-spec will be a way helps to solve this issue in the near future, although not thoroughly.
__________________
Xi Wang
Xi Wang is offline   Reply With Quote
Old 04-15-2011, 07:43 AM   #19
Xi Wang
Senior Member
 
Location: MDC, Berlin, Germany

Join Date: Oct 2009
Posts: 317
Default

Quote:
Originally Posted by samoth View Post
yes thats right.
The gene i investigated is not the sole gene related to the phenotype. but my project is to determine the effect of the expression of this one gene on the phenotype.
gene expression was carried out using real time qPCR measuring relative expression (using TaqMan MGB probes)
What do you mean with "2) how about the ones with extremely low expression levels? "
thanks
By "2) how about the ones with extremely low expression levels? " I mean how about the 6 subjects in the group with lower expression levels. Why did you also analyze this group? Some different phenotype?
__________________
Xi Wang
Xi Wang is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



All times are GMT -8. The time now is 10:12 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2022, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO