SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Differentially expressed genes Parharn Bioinformatics 11 03-04-2014 02:16 AM
still too many differentially expressed genes. thejustpark Bioinformatics 2 11-03-2013 01:54 PM
help! cannot find any deferentially expressed genes ericfit RNA Sequencing 4 10-09-2013 03:32 AM
Different expressed genes within population LyingToForget Bioinformatics 0 08-20-2013 04:03 AM
What to do after finding differentially expressed genes? sazz Bioinformatics 1 07-15-2013 10:39 AM

Reply
 
Thread Tools
Old 03-28-2014, 10:09 AM   #1
apredeus
Senior Member
 
Location: Bioinformatics Institute, SPb

Join Date: Jul 2012
Posts: 151
Default Classify genes as expressed or not expressed

Hello all,

this is probably a very obvious question, but I've never dealt with this sort of a problem, so I hope you all can point me in the right direction.

Imagine we have an array or annotated and quantified RNA-seq experiment. There are about ~24k genes, with normalized numerical expression value (or FPKM) assigned to them.

What is the most statistically sound way to automatically classify genes as "expressed" and "not expressed"? People often use empirical cutoff for this, e.g. FPKM of 1, but that's not what I'm interested in.

Thank you for any inputs.
apredeus is offline   Reply With Quote
Old 03-28-2014, 10:49 AM   #2
cariboudoug
Member
 
Location: Toronto

Join Date: Oct 2013
Posts: 17
Default

I was wondering this as well... I have multi-species/ multi-individual RNA-seq data so its easy to bin genes into on and off if they are expressed highly in all individuals of 1 species and very little in another... The difficulty is that many genes might be expressed stochastically and at low levels although still functional. If both species have low FPKM then I have trouble distinguishing mapping errors and actual low expression.

I was thinking of just choosing a cut-off based on the distribution of FPKMs that makes sense to me
cariboudoug is offline   Reply With Quote
Old 03-28-2014, 10:56 AM   #3
Bukowski
Senior Member
 
Location: Aberdeen, Scotland

Join Date: Jan 2010
Posts: 388
Default

How can you ever say something is 'not expressed'? Absence of evidence is not evidence of absence, and this is especially true with RNA-Seq data, because it's a sampling technique. Unless you're doing ridiculously deep sequencing on your samples any cut off you put in place is pretty arbitrary.
Bukowski is offline   Reply With Quote
Old 03-28-2014, 11:10 AM   #4
apredeus
Senior Member
 
Location: Bioinformatics Institute, SPb

Join Date: Jul 2012
Posts: 151
Default

Yes, I see your point. Furthermore, if I remember the ENCODE papers correctly, they came to the conclusion that good fraction of mRNAs are present in some cells of the same type, and absent in others. Thus we would only see certain average, which could be quite low..

On the other hand, especially for microarrays, there is quite big (numerical) difference between something obviously expressed, and something that's not. From the practical standpoint (I need it for gene expression clustering) it would have been useful to restrict the gene array to only significantly expressed ones. How would one go about it? I do have some ideas, but I'm curious what do others think.
apredeus is offline   Reply With Quote
Old 08-12-2014, 05:55 AM   #5
bruce01
Senior Member
 
Location: .

Join Date: Mar 2011
Posts: 157
Default

"Absence of evidence is not evidence of absence , and this is especially true with RNA-Seq data, because it's a sampling technique"

I agree wholeheartedly, but there is still a distribution for that sample you take, and I like to think that taking a cut-off from that distribution is 'less arbitrary' than the "FPKM<1" route.

I put up a script on github after being asked about a method I use in a seminar. If you want to have a look I would appreciate ideas about this. A workable solution, I thought, though I am frequently wrong!
bruce01 is offline   Reply With Quote
Old 08-01-2017, 08:59 AM   #6
krapulaxdoctor
Member
 
Location: Netherlands

Join Date: May 2015
Posts: 20
Default

Hi, I hope to revive this thread, because I need some advice:
I got a debate with colleagues about the very same question: which gene is expressed or not. When I mentioned that I used an FPKM cutoff to select genes I further analyzed, they demanded statistics to prove if my selected genes are significant.
Could someone propose a statistical approach to show whether my genes are expressed and if that is "significant"?
krapulaxdoctor is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:05 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO