SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing

Similar Threads
Thread Thread Starter Forum Replies Last Post
Identify and visualize differentially expressed genes from RNA-Seq data? mediator Bioinformatics 7 05-09-2012 01:00 AM
RNA-Seq: Detection of splicing events and multiread locations from RNA-seq data based Newsbot! Literature Watch 0 10-26-2011 03:50 AM
RNA-Seq: Protocol dependence of sequencing-based gene expression measurements. Newsbot! Literature Watch 0 05-17-2011 03:10 AM
Gene prediction with both RNA-seq mapping and genomic sequence based inputs? jstjohn Bioinformatics 3 04-08-2011 11:59 AM
RNA-Seq: SAW: A Method to Identify Splicing Events from RNA-Seq Data Based on Splicin Newsbot! Literature Watch 0 08-14-2010 03:00 AM

Reply
 
Thread Tools
Old 07-10-2011, 05:58 PM   #1
zeam
Member
 
Location: USA

Join Date: Oct 2010
Posts: 38
Default How to define a expressed gene based on RNA-seq data

Recently,I have been working with RNA-seq data,and one problem I met is how to define a expressed gene.I use RPKM to nomalize the expression level,but the cutoff value to define a expressed gene is a problem.Can somebody give me any suggestions?I have read some papers,some of them said 1 (RPKM) would be fine,but according your experiences,how to do that?
zeam is offline   Reply With Quote
Old 07-11-2011, 09:31 AM   #2
kwatts59
Member
 
Location: nevada

Join Date: Apr 2011
Posts: 46
Default

I posted this same question and nobody replied.
kwatts59 is offline   Reply With Quote
Old 07-11-2011, 09:47 AM   #3
chadn737
Senior Member
 
Location: US

Join Date: Jan 2009
Posts: 392
Default

Its a question I have asked as well many times and searched extensively for. First, if you have reads mapping uniquely to a gene, then I don't think you can say that its not expressed, only that its expressed at very low levels. Any cutoff it seems to me, will most likely be an arbitrary one, not based on actual biological meaning.
chadn737 is offline   Reply With Quote
Old 07-11-2011, 10:09 AM   #4
kopi-o
Senior Member
 
Location: Stockholm, Sweden

Join Date: Feb 2008
Posts: 319
Default

One way to derive a threshold value is to follow the procedure in this paper:

http://www.ploscompbiol.org/article/...l.pcbi.1000598
kopi-o is offline   Reply With Quote
Old 07-11-2011, 10:38 AM   #5
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

I fully agree with chadn737.

The only sensible defintion of "expressed" I can see is that transcripts of the gene have been produced. And this is definitely the case already when you see a single read (unless the read could have been from another locus).

With microarrays, people use the term "expressed" to denote "present above background". This makes sense only if one refrains from calling genes with flourescense at background level "not expressed", because there is no way to say whether a gene is really fully switched off or simply so weakly expressed that its signal cannot be seen above the background autoflourescence. Somehow, this did not keep people from assuming that any gene they cannot see on their array is "switched off". (I had endless discussions with wet-lab collaborators who insisted that we give a percentage of non-expressed genes "because everybody does so".)

Now, with high-coverage RNA-Seq, it turns out that genes are hardly ever perfectly silent -- although I wonder if this really that surprising. (If this very low transcription turned out to be functional rather than just leakage, that would be surprising, I suppose.)
Simon Anders is offline   Reply With Quote
Old 07-11-2011, 11:32 AM   #6
chadn737
Senior Member
 
Location: US

Join Date: Jan 2009
Posts: 392
Default

Quote:
Originally Posted by Simon Anders View Post
I fully agree with chadn737.

The only sensible defintion of "expressed" I can see is that transcripts of the gene have been produced. And this is definitely the case already when you see a single read (unless the read could have been from another locus).

With microarrays, people use the term "expressed" to denote "present above background". This makes sense only if one refrains from calling genes with flourescense at background level "not expressed", because there is no way to say whether a gene is really fully switched off or simply so weakly expressed that its signal cannot be seen above the background autoflourescence. Somehow, this did not keep people from assuming that any gene they cannot see on their array is "switched off". (I had endless discussions with wet-lab collaborators who insisted that we give a percentage of non-expressed genes "because everybody does so".)

Now, with high-coverage RNA-Seq, it turns out that genes are hardly ever perfectly silent -- although I wonder if this really that surprising. (If this very low transcription turned out to be functional rather than just leakage, that would be surprising, I suppose.)
The idea of leaky transcription is one that really needs to be addressed. Particularly when some studies have shown that the majority of transcriptional events do not even make full-length mRNAs:

In vivo dynamics of RNA polymerase II transcription
http://www.nature.com/nsmb/journal/v.../nsmb1280.html

Then there is the fact that there is often poor correlation between transcriptomics and proteomics data confounds the issue. If the reads are mapping to the gene, I think its real, but at those low levels, I agree that it seems questionable whether or not it is functional.

I like to thing that if a reasonable estimate of leaky transcription could be obtained and applied to transcriptomic data, that you would see increased correlation with proteomic data.

Since I mainly have used RNA-seq for differential expression, my concern at what point are you confident that a gene is differentially expressed. Doesn't the "shot noise" of DESeq address this issue?
chadn737 is offline   Reply With Quote
Old 07-11-2011, 02:27 PM   #7
Joann
Senior Member
 
Location: Woodbridge CT

Join Date: Oct 2008
Posts: 231
Default What are the assumptions about the sample source?

If the assumption is that every single cell of multi-cell sample source is frozen in an identical,homogeneous biological state, these results would be biologically relevant. But from my perspective, the possibility of (many) different (multiple) levels of cellular heterogeneity in any sample of cells primarily characterized based on gross histology should be considered. Such that the dynamics of a given cell population can become better identified.
Joann is offline   Reply With Quote
Reply

Tags
expression, gene, rna-seq, rpkm

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:06 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO