Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
Differentially expressed genes Parharn Bioinformatics 11 03-04-2014 03:16 AM
still too many differentially expressed genes. thejustpark Bioinformatics 2 11-03-2013 02:54 PM
help! cannot find any deferentially expressed genes ericfit RNA Sequencing 4 10-09-2013 04:32 AM
Different expressed genes within population LyingToForget Bioinformatics 0 08-20-2013 05:03 AM
What to do after finding differentially expressed genes? sazz Bioinformatics 1 07-15-2013 11:39 AM

Thread Tools
Old 03-28-2014, 11:09 AM   #1
Senior Member
Location: Bioinformatics Institute, SPb

Join Date: Jul 2012
Posts: 151
Default Classify genes as expressed or not expressed

Hello all,

this is probably a very obvious question, but I've never dealt with this sort of a problem, so I hope you all can point me in the right direction.

Imagine we have an array or annotated and quantified RNA-seq experiment. There are about ~24k genes, with normalized numerical expression value (or FPKM) assigned to them.

What is the most statistically sound way to automatically classify genes as "expressed" and "not expressed"? People often use empirical cutoff for this, e.g. FPKM of 1, but that's not what I'm interested in.

Thank you for any inputs.
apredeus is offline   Reply With Quote
Old 03-28-2014, 11:49 AM   #2
Location: Toronto

Join Date: Oct 2013
Posts: 17

I was wondering this as well... I have multi-species/ multi-individual RNA-seq data so its easy to bin genes into on and off if they are expressed highly in all individuals of 1 species and very little in another... The difficulty is that many genes might be expressed stochastically and at low levels although still functional. If both species have low FPKM then I have trouble distinguishing mapping errors and actual low expression.

I was thinking of just choosing a cut-off based on the distribution of FPKMs that makes sense to me
cariboudoug is offline   Reply With Quote
Old 03-28-2014, 11:56 AM   #3
Senior Member
Location: UK

Join Date: Jan 2010
Posts: 390

How can you ever say something is 'not expressed'? Absence of evidence is not evidence of absence, and this is especially true with RNA-Seq data, because it's a sampling technique. Unless you're doing ridiculously deep sequencing on your samples any cut off you put in place is pretty arbitrary.
Bukowski is offline   Reply With Quote
Old 03-28-2014, 12:10 PM   #4
Senior Member
Location: Bioinformatics Institute, SPb

Join Date: Jul 2012
Posts: 151

Yes, I see your point. Furthermore, if I remember the ENCODE papers correctly, they came to the conclusion that good fraction of mRNAs are present in some cells of the same type, and absent in others. Thus we would only see certain average, which could be quite low..

On the other hand, especially for microarrays, there is quite big (numerical) difference between something obviously expressed, and something that's not. From the practical standpoint (I need it for gene expression clustering) it would have been useful to restrict the gene array to only significantly expressed ones. How would one go about it? I do have some ideas, but I'm curious what do others think.
apredeus is offline   Reply With Quote
Old 08-12-2014, 06:55 AM   #5
Senior Member
Location: .

Join Date: Mar 2011
Posts: 157

"Absence of evidence is not evidence of absence , and this is especially true with RNA-Seq data, because it's a sampling technique"

I agree wholeheartedly, but there is still a distribution for that sample you take, and I like to think that taking a cut-off from that distribution is 'less arbitrary' than the "FPKM<1" route.

I put up a script on github after being asked about a method I use in a seminar. If you want to have a look I would appreciate ideas about this. A workable solution, I thought, though I am frequently wrong!
bruce01 is offline   Reply With Quote
Old 08-01-2017, 09:59 AM   #6
Location: Netherlands

Join Date: May 2015
Posts: 20

Hi, I hope to revive this thread, because I need some advice:
I got a debate with colleagues about the very same question: which gene is expressed or not. When I mentioned that I used an FPKM cutoff to select genes I further analyzed, they demanded statistics to prove if my selected genes are significant.
Could someone propose a statistical approach to show whether my genes are expressed and if that is "significant"?
krapulaxdoctor is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 08:39 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO