SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to perform this experiment. Ivan Castro Illumina/Solexa 6 03-05-2014 09:56 AM
Normalization for RNA-IP experiment jazz Bioinformatics 0 07-26-2013 09:04 AM
Help to plan experiment Nullie Illumina/Solexa 2 06-03-2013 06:36 AM
Mixing dual and single index TruSeq samples in a single MiSeq run pmiguel Illumina/Solexa 1 12-21-2012 06:21 AM
Need Help Designing Experiment. master_shake RNA Sequencing 0 07-17-2012 02:56 PM

Reply
 
Thread Tools
Old 05-26-2014, 11:45 AM   #1
reventropy
Junior Member
 
Location: Colorado

Join Date: Apr 2014
Posts: 7
Default Cuffcompare with a single experiment

I am running RNA-seq analysis on a paired-end deep sequencing data set with no replicates. We are interested in finding novel gene and transcript isoforms in addition to variant info. Grooming and Tophat alignment went well and I’ve processed the .bam output through cufflinks in RABT mode with –GTF-guide. I then take the .gtf output from this and run cuffcompare with the reference .gtf and .fasta.

I am experiencing confusion related to the last step and was hoping that somebody with more experience than I could help to clarify a few things.

Firstly, most of the references I have read regading cuffcompare indicate that it is used for multiple replicates or experiments: “Used to Track Cufflinks transcripts across multiple experiments (e.g. across a time course)”. Is it common to use cuffcompare on a single experiment in order to find novel isoforms?

Secondly, there are some entries in the output from cuffcompare that aren’t making sense to me. What does it mean when I see an "=" class code with a zero FMI? How about a "j" class code with a FMI of 100? Based on the definition of FMI (fraction of major isoform), these scenarios don't seem possible.

Thirdly, if I want an fpkm score for a known gene, is it common to sum all transcript fpkms belonging to that gene with an "=" class code?







Thanks so much for any help, and let me know if I can/should provide more information!



-Jeremy
reventropy is offline   Reply With Quote
Old 05-26-2014, 08:06 PM   #2
mikep
Member
 
Location: Singapore

Join Date: Feb 2011
Posts: 45
Default

Quote:
Originally Posted by reventropy View Post
Firstly, most of the references I have read regading cuffcompare indicate that it is used for multiple replicates or experiments: “Used to Track Cufflinks transcripts across multiple experiments (e.g. across a time course)”. Is it common to use cuffcompare on a single experiment in order to find novel isoforms?
Depends on your definition of "common". There's no technical reason you can't (I certainly have). Usually people use the cuffcompare output as the guide file for cuffdiff. The former gives you the union set of transcripts, the latter then looks for differential expression in those transcripts.

Quote:
Secondly, there are some entries in the output from cuffcompare that aren’t making sense to me. What does it mean when I see an "=" class code with a zero FMI? How about a "j" class code with a FMI of 100? Based on the definition of FMI (fraction of major isoform), these scenarios don't seem possible.
cuffcompare outputs all the transcripts it finds, or is told are real (exist in the guide file). "=" transcripts exist in the guide file, so are output even if there's no support for their existence. It's not clear why you think a j class transcript cannot have an FMI of 100.


Quote:
Thirdly, if I want an fpkm score for a known gene, is it common to sum all transcript fpkms belonging to that gene with an "=" class code?
Summing fpkms is fine, but you should include novel transcripts, or rerun cufflinks without novel transcript finding.
mikep is offline   Reply With Quote
Old 05-27-2014, 08:45 AM   #3
reventropy
Junior Member
 
Location: Colorado

Join Date: Apr 2014
Posts: 7
Default

Thanks a lot mikep!

Quote:
It's not clear why you think a j class transcript cannot have an FMI of 100.
This is probably owing to my flawed reasoning.

I was operating under the assumption that major isoforms come from the annotation file and cannot be novel. If I see an FMI of 100 and a "j" class code then should I assume that Cufflinks identified the man isoform as being novel, i.e., a novel gene?

Thanks again for addressing my questions so that I can proceed with more confidence.

-Jeremy
reventropy is offline   Reply With Quote
Old 05-27-2014, 05:22 PM   #4
mikep
Member
 
Location: Singapore

Join Date: Feb 2011
Posts: 45
Default

Quote:
Originally Posted by reventropy View Post
If I see an FMI of 100 and a "j" class code then should I assume that Cufflinks identified the man isoform as being novel, i.e., a novel gene?

-Jeremy
Your interpretation is correct.

Quote:
Thanks again for addressing my questions so that I can proceed with more confidence.
I would be very careful being confident in novel isoforms from cufflinks, it has a pretty high error rate. You haven't mentioned which organism you are working with but if it is human or one of the model organisms you might be better off with using just the existing annotation. If it is a few genes you care about I'd load the cufflinks output & BAM file into a genome viewer and have a look at the actual reads.
mikep is offline   Reply With Quote
Old 05-27-2014, 07:55 PM   #5
reventropy
Junior Member
 
Location: Colorado

Join Date: Apr 2014
Posts: 7
Default

Quote:
I would be very careful being confident in novel isoforms from cufflinks, it has a pretty high error rate. You haven't mentioned which organism you are working with but if it is human or one of the model organisms you might be better off with using just the existing annotation. If it is a few genes you care about I'd load the cufflinks output & BAM file into a genome viewer and have a look at the actual reads.
I'll definitely keep that in the front of my mind. The sequencing is human. I have been using IGV, but am still training my eye. We're only interested in coding genes so I will be filtering the cuffdiff output, but we would like to catch any novel transcripts or gene isoforms in this subset.

-Jeremy
reventropy is offline   Reply With Quote
Reply

Tags
cuffcompare, cufflinks, rna-sq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:43 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO