SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
Fusion or Chimeric gene discovery tools Debu_013 Bioinformatics 3 07-10-2013 01:05 PM
Gene discovery point blank Bioinformatics 3 04-25-2013 11:32 PM
Cuffcompare codes and use in novel non-coding gene discovery warrenemmett Bioinformatics 0 04-11-2013 06:53 AM
PubMed: In silico analysis of the exome for gene discovery. Newsbot! Literature Watch 0 11-09-2011 11:20 AM

Reply
 
Thread Tools
Old 11-12-2013, 11:37 AM   #1
AJenkins
Member
 
Location: Boston

Join Date: Nov 2013
Posts: 13
Default Cufflinks novel gene discovery

I have two questions regarding novel gene discovery using Cufflinks:

1.) Does anyone have a good answer for what the appropriate FPKM cutoff value should be during gene discovery using Cufflinks. I have approximately 13,000 "novel genes" identified using Cufflinks, but many of them possess very low FPKMs. What should be the appropriate FPKM that a gene should possess in order to be accepted as a true novel gene? Is there a consensus? I was thinking of using an FPKM of 5 but was unsure if this was too high.

2.) I have run 4 insect stages of RNA-seq data (all with coverage of approximately 140-200 million reads) through Cufflinks and Cuffcompare. It indicates that there are no novel genes that are present in more than 1 of the time point according to the file [name].tracking Am I misinterpreting the results?

Thanks everyone!
AJenkins is offline   Reply With Quote
Old 12-26-2013, 06:03 PM   #2
l0o0
Junior Member
 
Location: China

Join Date: Jul 2013
Posts: 3
Default

Hi, AJenkins. How do you find the 13000 novel genes? Can you specify the the procedure?
Thx
l0o0 is offline   Reply With Quote
Old 12-26-2013, 06:04 PM   #3
l0o0
Junior Member
 
Location: China

Join Date: Jul 2013
Posts: 3
Default

Hi, AJenkins. How do you find the 13000 novel genes? Can you specify the the procedure?
Thx
l0o0 is offline   Reply With Quote
Old 01-09-2014, 12:12 PM   #4
AJenkins
Member
 
Location: Boston

Join Date: Nov 2013
Posts: 13
Default

Using upwards of 600 million reads, I have aligned the reads to the reference genome using the Tuxedo suite of programs and using a RABT assembly. This gives me around 10,000 completely unknown genes that aren't associated with any isoforms or known genes. I want to say, with confidence, that the FPKM associated with an unknown gene represents the gene having full coverage and being fully represented.

The problem I'm having is that I cannot determine a good way to create this cutoff point, any ideas?
AJenkins is offline   Reply With Quote
Old 01-10-2014, 06:19 AM   #5
rboettcher
Member
 
Location: Berlin

Join Date: Oct 2010
Posts: 71
Default

Novel depends on which reference annotation you are using. If there is more than one source, start with filtering out everything that shows any overlap with any known gene. Next, you can filter for genes with >1 exons, as Cufflinks reports a lot of FP exons and spliced transcripts can be easier validated in the lab. Besides that, there is no general rule concerning a cut-off for FPKM values. In fact, there is still some discussion ongoing whether FPKM is actually a good representation of expression in general, so I would argue to just look at the FPKM distribution and then chose a cut-off deemed OK.

Best,
René
rboettcher is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:46 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO