SEQanswers (
-   Bioinformatics (
-   -   Cufflinks novel gene discovery (

AJenkins 11-12-2013 11:37 AM

Cufflinks novel gene discovery
I have two questions regarding novel gene discovery using Cufflinks:

1.) Does anyone have a good answer for what the appropriate FPKM cutoff value should be during gene discovery using Cufflinks. I have approximately 13,000 "novel genes" identified using Cufflinks, but many of them possess very low FPKMs. What should be the appropriate FPKM that a gene should possess in order to be accepted as a true novel gene? Is there a consensus? I was thinking of using an FPKM of 5 but was unsure if this was too high.

2.) I have run 4 insect stages of RNA-seq data (all with coverage of approximately 140-200 million reads) through Cufflinks and Cuffcompare. It indicates that there are no novel genes that are present in more than 1 of the time point according to the file [name].tracking Am I misinterpreting the results?

Thanks everyone!

l0o0 12-26-2013 06:03 PM

Hi, AJenkins. How do you find the 13000 novel genes? Can you specify the the procedure?

l0o0 12-26-2013 06:04 PM

Hi, AJenkins. How do you find the 13000 novel genes? Can you specify the the procedure?

AJenkins 01-09-2014 12:12 PM

Using upwards of 600 million reads, I have aligned the reads to the reference genome using the Tuxedo suite of programs and using a RABT assembly. This gives me around 10,000 completely unknown genes that aren't associated with any isoforms or known genes. I want to say, with confidence, that the FPKM associated with an unknown gene represents the gene having full coverage and being fully represented.

The problem I'm having is that I cannot determine a good way to create this cutoff point, any ideas?

rboettcher 01-10-2014 06:19 AM

Novel depends on which reference annotation you are using. If there is more than one source, start with filtering out everything that shows any overlap with any known gene. Next, you can filter for genes with >1 exons, as Cufflinks reports a lot of FP exons and spliced transcripts can be easier validated in the lab. Besides that, there is no general rule concerning a cut-off for FPKM values. In fact, there is still some discussion ongoing whether FPKM is actually a good representation of expression in general, so I would argue to just look at the FPKM distribution and then chose a cut-off deemed OK.


All times are GMT -8. The time now is 09:54 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.