![]() |
|
|||||||
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| cufflinks-1.0.3 produces very high FPKM values when compared to cufflinks-0.9.3. Why? | pinki999 | Bioinformatics | 5 | 06-09-2012 06:48 AM |
| Cufflinks cufflinks v1.0.3 - segmentation fault bias correction chrNT annotations | adrian | Bioinformatics | 0 | 06-08-2011 01:28 PM |
![]() |
|
|
Thread Tools |
|
|
#1 |
|
Member
Location: Kentucky Join Date: May 2012
Posts: 37
|
I have been running tophat/cufflinks/cuffdiff on fungal and human RNAseq data. Some of the FPKM values seemed high so I decided to look at the alignment files (accepted_hits.bam) to count numbers of reads hitting selected genes. I am unable to come up with anything near the values produced in the cuffdiff output. For example one cufflinks locus (XLOC) had a reported FPKM of 421 yet there were zero reads mapping in the corresponding genomic region.
A related issue is that some of the reported loci span genomic regions well beyond the borders of transcripts defined in the supplied .gtf file. However, inspection of the alignment file reveals no reads that support the extension of the transcript. Am I missing something here? Specifically, does cufflinks use information other than that contained in the accepted_hits.bam file to calculate FPKM and define its (XLOC) loci? |
|
|
|
|
|
#2 |
|
Member
Location: Kentucky Join Date: May 2012
Posts: 37
|
I'm currently following up on this by generating a control dataset containing known transcript abundances. Stay tuned...
|
|
|
|
|
|
#3 |
|
Member
Location: Kentucky Join Date: May 2012
Posts: 37
|
So, using artificially-generated control datasets, I find that cufflinks is flawed in two ways:
First, it's FPKM values are inflated. Problem is the magnitude of inflation varies from gene-to-gene - there is no consistency in the error. Second the "locus" interval defined in the cuffdiff output is often just plain wrong. In many instances, the reported "locus" frequently spans multiple transcripts and intergenic regions, even though the dataset contains reads from only one transcript. In other words, neither the .gtf file, nor the input sequence data support expansion of the "locus" to cover multiple genes. |
|
|
|
|
|
#4 |
|
Member
Location: Cincinnati Join Date: Jan 2012
Posts: 11
|
Hi, I've met almost the same problem in addition in gtf file for mm9 from UCSC annotation I have:
chr10 unknown exon 80640798 80640979 . + . gene_id "Eef2"; gene_name "Eef2"; p_id "P7224"; transcript_id "NM_007907"; tss_id "TSS5168"; chr10 unknown CDS 80641426 80641637 . + 2 gene_id "Eef2"; gene_name "Eef2"; p_id "P7224"; transcript_id "NM_007907"; tss_id "TSS5168"; chr10 unknown exon 80641426 80641637 . + . gene_id "Eef2"; gene_name "Eef2"; p_id "P7224"; transcript_id "NM_007907"; tss_id "TSS5168"; chr10 unknown exon 80641706 80641758 . + . gene_id "Snord37"; gene_name "Snord37"; transcript_id "NR_028549"; tss_id "TSS16143"; chr10 unknown CDS 80641826 80642004 . + 0 gene_id "Eef2"; gene_name "Eef2"; p_id "P7224"; transcript_id "NM_007907"; tss_id "TSS5168"; chr10 unknown exon 80641826 80642004 . + . gene_id "Eef2"; gene_name "Eef2"; p_id "P7224"; transcript_id "NM_007907"; tss_id "TSS5168"; chr10 unknown CDS 80642091 80642196 . + 1 gene_id "Eef2"; gene_name "Eef2"; p_id "P7224"; transcript_id "NM_007907"; tss_id "TSS5168"; chr10 unknown exon 80642091 80642196 . + . gene_id "Eef2"; gene_name "Eef2"; p_id "P7224"; transcript_id "NM_007907"; tss_id "TSS5168"; chr10 unknown CDS 80642289 80642541 . + 0 gene_id "Eef2"; gene_name "Eef2"; p_id "P7224"; transcript_id "NM_007907"; tss_id "TSS5168"; Snord37 gene inside Eef2 and length of the Snord37 gene is just 52 but in cuffdiff output I've got: Snord37 Snord37 Snord37 chr10:80639375-80645254 Control IL33 OK 0 8173.79 1.79769e+308 1.79769e+308 0.0786496 0.428305 no locus size is 5879. Also cufflinks found 8173.79 FPKM in bam file for the Snord37 but there just 2 reads. I have a couple other examples. I've tested it on 1.2.1, 1.3.0, 2.0.0 versions of cufflinks the result is the same. |
|
|
|
|
|
#5 |
|
Member
Location: Kentucky Join Date: May 2012
Posts: 37
|
I'm glad to hear to someone else can verify my suspicions. I have contacted the tophat cufflink support site about this but I do not expect them to reply because they ignored a previous question I submitted about a month ago.
|
|
|
|
|
|
#6 |
|
not just another member
Location: Belgium Join Date: Aug 2010
Posts: 246
|
Is there other tool like cufflinks ? to compare the results.
|
|
|
|
|
|
#7 |
|
Senior Member
Location: Germany Join Date: Oct 2008
Posts: 286
|
I haven't tried cufflinks but have heard others complaining at conferences.
I have been impressed with edgeR and use that in production here. |
|
|
|
|
|
#8 | |
|
not just another member
Location: Belgium Join Date: Aug 2010
Posts: 246
|
Quote:
yes edgeR and DESeq work pretty well. But is there a tool to perform a reference-based transcriptome assembly (like cufflinks) |
|
|
|
|
|
|
#9 |
|
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 946
|
Have you looked at MapSplice? http://www.netlab.uky.edu/p/bioinfo/MapSplice
Last edited by GenoMax; 06-11-2012 at 04:21 AM. |
|
|
|
|
|
#10 | |
|
Senior Member
Location: Boston Join Date: Nov 2009
Posts: 219
|
Quote:
http://www.broadinstitute.org/software/scripture/ |
|
|
|
|
|
|
#11 |
|
Member
Location: Berlin Join Date: Oct 2010
Posts: 56
|
Hi all,
I just finished my first cufflinks run on RNAseq data and I also encountered results that make me doubt the validity of cufflinks' and cuffdiff's output. Therefore I'm also considering to switch my analysis pipeline and rerun the analysis. However, during an Agilent seminar last week it was mentioned that Scripture would be an alternative which is heavy weight and requires serious computational ressources in order to perform the assembly. So my question is: does anybody already have experiences with Scripture and if so could you give recommendations towards the machine specifications needed? Best regards |
|
|
|
|
|
#12 |
|
Member
Location: Kentucky Join Date: May 2012
Posts: 37
|
MapSplice is good for gene structure analysis but doesn't do differential expression analysis.
|
|
|
|
|
|
#13 | |
|
not just another member
Location: Belgium Join Date: Aug 2010
Posts: 246
|
Quote:
Yes but I have several problems to run it. I'll open a new thread now with my problems. edit > here's the thread for scripture : http://seqanswers.com/forums/showthr...5998#post75998 Last edited by NicoBxl; 06-12-2012 at 12:47 AM. |
|
|
|
|
|
|
#14 |
|
Member
Location: Kentucky Join Date: May 2012
Posts: 37
|
Upon quick inspection, it appears to me that Scripture simply assembles transcripts but does not quantify and compare expression levels. Is that the case?
Last edited by drdna; 06-12-2012 at 12:49 PM. |
|
|
|
|
|
#15 | |
|
Senior Member
Location: MO Join Date: Jan 2009
Posts: 309
|
Quote:
This approach would avoid any potential problems with Cufflinks quantification/differential expression while giving the advantage of a reference-based transcriptome assembly. |
|
|
|
|
|
|
#16 |
|
Member
Location: Cincinnati Join Date: Jan 2012
Posts: 11
|
After a lot of digging about wrong FPKMs and cufflink in the forum and documentation. I tried to check cds_exp.diff and was surprised that FPKMs there and gene list (after infinity filtering +-1.79E+308) are near expected values. Maybe we incorrectly interpret how cufflinks split reads between intersect regions which are a lot in GTF file (CDS, exons, stop-codons...) ?
|
|
|
|
|
|
#17 |
|
Member
Location: Kentucky Join Date: May 2012
Posts: 37
|
If you follow this thread, you will see that there is a problem with this approach because cufflinks/cuffmerge produces erroneous .gtf files which contains instances where multiple transcripts are merged into one (despite the lack of any evidence to support such mergings).
|
|
|
|
|
|
#18 |
|
Member
Location: Kentucky Join Date: May 2012
Posts: 37
|
To Portah. Good find, I'm going to go check the cds_exp.diff file for my runs and see if it makes more sense. Regardless, there is still an issue with transcript/reference annotation merging.
|
|
|
|
|
|
#19 | |
|
Senior Member
Location: MO Join Date: Jan 2009
Posts: 309
|
Quote:
I'm curious though, how are you guys running cufflinks? I'm assuming you are using the -g/--GTF-guide argument? Or does this problem persist even if you give it the -G/--GTF argument and tell it not to look for novel transcripts and stick to the supplied GTF file? |
|
|
|
|
|
|
#20 |
|
Member
Location: Kentucky Join Date: May 2012
Posts: 37
|
chadn737, Yes and yes. I have been running cuffmerge using a reference gtf and the --no-novel-juncs flag.
So if you are using a count-based method of DE analysis, do you align your reads with gene sequences, as opposed to a genome assembly? I'd be interested in hearing a little bit more about your approach. |
|
|
|
![]() |
| Tags |
| cuffdiff, cufflinks |
| Thread Tools | |
|
|