Hi everyone,
I am trying to use Gencode (ver 7) to do my RNA-seq analysis, but I am having various issues.
1) if I use Gencode as downloaded from the Project website, cufflinks runs for ever. As a comparison, if I use the Ensambl release (using the latest version linked from cufflinks website), the run time is at least 20 times shorter.
Why is there such a huge difference ? just a matter of annotation size ??
2) The original fastq files were mapped (by someone else) with Tophat and using a filtered version of Gencode (it is much smalle than the original).
Is there a problem if I now run cufflinks on the full version of Gencode ?
Intuitively I may expect some loss of accuracy in whatever was filtered out of the full Gencode (given that Tophat ran on a filtered version)
3) Gencode has various duplicated entries. While Tophat and cufflinks do not seem to mind, cuffmerge and cuffcompare exit with an error.
How to solve this issue ?
I guess one can filter the duplicated entries with gffread from either the annotation itself or the transcript.gtf files ???
4) I also ran cufflinks with Ensambl (linked from the cuflinks website), but the resulting transcript.gtf, genes.fpkm_tracking, and isoforms.fpkm_tracking do not have any gene (or transcript) symbol, rather all have the prefix CUFF. What is the problem ? How do I get back the gene symbols ?? could I use cuffmerge ?
Thanks
Gianfilippo
I am trying to use Gencode (ver 7) to do my RNA-seq analysis, but I am having various issues.
1) if I use Gencode as downloaded from the Project website, cufflinks runs for ever. As a comparison, if I use the Ensambl release (using the latest version linked from cufflinks website), the run time is at least 20 times shorter.
Why is there such a huge difference ? just a matter of annotation size ??
2) The original fastq files were mapped (by someone else) with Tophat and using a filtered version of Gencode (it is much smalle than the original).
Is there a problem if I now run cufflinks on the full version of Gencode ?
Intuitively I may expect some loss of accuracy in whatever was filtered out of the full Gencode (given that Tophat ran on a filtered version)
3) Gencode has various duplicated entries. While Tophat and cufflinks do not seem to mind, cuffmerge and cuffcompare exit with an error.
How to solve this issue ?
I guess one can filter the duplicated entries with gffread from either the annotation itself or the transcript.gtf files ???
4) I also ran cufflinks with Ensambl (linked from the cuflinks website), but the resulting transcript.gtf, genes.fpkm_tracking, and isoforms.fpkm_tracking do not have any gene (or transcript) symbol, rather all have the prefix CUFF. What is the problem ? How do I get back the gene symbols ?? could I use cuffmerge ?
Thanks
Gianfilippo