![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Differentially expressed genes | Parharn | Bioinformatics | 11 | 03-04-2014 03:16 AM |
still too many differentially expressed genes. | thejustpark | Bioinformatics | 2 | 11-03-2013 02:54 PM |
help! cannot find any deferentially expressed genes | ericfit | RNA Sequencing | 4 | 10-09-2013 04:32 AM |
Different expressed genes within population | LyingToForget | Bioinformatics | 0 | 08-20-2013 05:03 AM |
What to do after finding differentially expressed genes? | sazz | Bioinformatics | 1 | 07-15-2013 11:39 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Senior Member
Location: Bioinformatics Institute, SPb Join Date: Jul 2012
Posts: 151
|
![]()
Hello all,
this is probably a very obvious question, but I've never dealt with this sort of a problem, so I hope you all can point me in the right direction. Imagine we have an array or annotated and quantified RNA-seq experiment. There are about ~24k genes, with normalized numerical expression value (or FPKM) assigned to them. What is the most statistically sound way to automatically classify genes as "expressed" and "not expressed"? People often use empirical cutoff for this, e.g. FPKM of 1, but that's not what I'm interested in. Thank you for any inputs. |
![]() |
![]() |
![]() |
#2 |
Member
Location: Toronto Join Date: Oct 2013
Posts: 17
|
![]()
I was wondering this as well... I have multi-species/ multi-individual RNA-seq data so its easy to bin genes into on and off if they are expressed highly in all individuals of 1 species and very little in another... The difficulty is that many genes might be expressed stochastically and at low levels although still functional. If both species have low FPKM then I have trouble distinguishing mapping errors and actual low expression.
I was thinking of just choosing a cut-off based on the distribution of FPKMs that makes sense to me |
![]() |
![]() |
![]() |
#3 |
Senior Member
Location: UK Join Date: Jan 2010
Posts: 390
|
![]()
How can you ever say something is 'not expressed'? Absence of evidence is not evidence of absence, and this is especially true with RNA-Seq data, because it's a sampling technique. Unless you're doing ridiculously deep sequencing on your samples any cut off you put in place is pretty arbitrary.
|
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: Bioinformatics Institute, SPb Join Date: Jul 2012
Posts: 151
|
![]()
Yes, I see your point. Furthermore, if I remember the ENCODE papers correctly, they came to the conclusion that good fraction of mRNAs are present in some cells of the same type, and absent in others. Thus we would only see certain average, which could be quite low..
On the other hand, especially for microarrays, there is quite big (numerical) difference between something obviously expressed, and something that's not. From the practical standpoint (I need it for gene expression clustering) it would have been useful to restrict the gene array to only significantly expressed ones. How would one go about it? I do have some ideas, but I'm curious what do others think. |
![]() |
![]() |
![]() |
#5 |
Senior Member
Location: . Join Date: Mar 2011
Posts: 157
|
![]()
"Absence of evidence is not evidence of absence , and this is especially true with RNA-Seq data, because it's a sampling technique"
I agree wholeheartedly, but there is still a distribution for that sample you take, and I like to think that taking a cut-off from that distribution is 'less arbitrary' than the "FPKM<1" route. I put up a script on github after being asked about a method I use in a seminar. If you want to have a look I would appreciate ideas about this. A workable solution, I thought, though I am frequently wrong! |
![]() |
![]() |
![]() |
#6 |
Member
Location: Netherlands Join Date: May 2015
Posts: 20
|
![]()
Hi, I hope to revive this thread, because I need some advice:
I got a debate with colleagues about the very same question: which gene is expressed or not. When I mentioned that I used an FPKM cutoff to select genes I further analyzed, they demanded statistics to prove if my selected genes are significant. Could someone propose a statistical approach to show whether my genes are expressed and if that is "significant"? |
![]() |
![]() |
![]() |
Thread Tools | |
|
|