![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
cufflinks output against annotation file | masylichu | Bioinformatics | 1 | 09-19-2012 03:43 AM |
Cufflinks and annotation file | mattia | Bioinformatics | 6 | 10-14-2011 06:34 AM |
Modification of reference genome annotation by cufflinks/cuffdiff? | markr | Bioinformatics | 3 | 07-20-2011 02:20 AM |
Run cufflinks with or without annotation? | reut | Bioinformatics | 7 | 06-01-2011 02:01 PM |
tophat/cufflinks for novel genome annotation | darked89 | Bioinformatics | 1 | 11-18-2010 07:53 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: North Carolina Join Date: Jan 2010
Posts: 82
|
![]()
Hi everyone,
There seem to be a couple of threads that touch on this, but I'm not sure any answered this question... hope this is not a repeat: I have mapped reads in tophat, and then used cufflinks to predict transcripts keeping a reference annotation in mind. I updated cufflinks today to 0.8.3 The gtf output from cuffcompare seems to be fine, for example: Code:
Chr1 Cufflinks exon 24604 24768 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00011789"; exon_number "1"; oId "fam3.11.1"; nearest_ref "ENSTGUT00000004895"; class_code "e"; tss_id "TSS1"; Chr1 Cufflinks exon 25776 25957 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00011789"; exon_number "2"; oId "fam3.11.1"; nearest_ref "ENSTGUT00000004895"; class_code "e"; tss_id "TSS1"; the annotation information from the gtf is not output into the tracking file. I'm particularly interested in the nearest_ref_id. Code:
tracking_id class_code nearest_ref_id gene_short_name tss_id locus q0_FPKM q0_conf_lo q0_conf_hi q1_FPKM q1_conf_lo q1_conf_hi XLOC_000001 - - - ,TSS1 Chr1:24603-28303 1.09128 0 2.93506 4.9699 0.545109 9.39469 XLOC_000002 - - - TSS2,TSS3 Chr1:639605-651348 32.5089 26.6549 38.3628 30.8267 25.9758 35.6777 Thanks, Chris Last edited by chrisbala; 08-04-2010 at 06:18 PM. |
![]() |
![]() |
![]() |
#2 |
Member
Location: North Carolina Join Date: Jan 2010
Posts: 82
|
![]()
i should specify that above I am referring to the genes. tracking. Another problem I suppose is that the cds files are empty.
|
![]() |
![]() |
![]() |
#3 |
Member
Location: North Carolina Join Date: Jan 2010
Posts: 82
|
![]()
Ok, I guess I've got it working ... I had to use a GTF with the proper features in order to get the gene names into the exp.diff file. Would be nice if I could get the Ensembl IDs to show up there or in the tracking file as well.
|
![]() |
![]() |
![]() |
#4 |
Junior Member
Location: Yale Stem Cell Center Join Date: Aug 2010
Posts: 3
|
![]()
Hi everyone,
I have one question: What is the mean of "Notest" in cuffdiff results? |
![]() |
![]() |
![]() |
#5 | |
Senior Member
Location: Germany Join Date: May 2010
Posts: 101
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#6 |
Junior Member
Location: Yale Stem Cell Center Join Date: Aug 2010
Posts: 3
|
![]()
OK!
Thank you very much ![]() |
![]() |
![]() |
![]() |
#7 | |
Member
Location: Boston, MA Join Date: Feb 2010
Posts: 10
|
![]() Quote:
What were those features of the GTF file that made it work? I'm currently struggling with Cuffcompare output that seems correct except that the "nearest_ref_id" field of genes.fpkm_tracking, etc. isn't filled in. If the attachment comes through you can see this.
Howie |
|
![]() |
![]() |
![]() |
#8 |
Member
Location: RI Join Date: May 2010
Posts: 10
|
![]()
I had to put gene_name attributes in my file to get it to track all the way through to the end (I just copied them from gene_id). Also, I think I had to use the -s switch in cuffcompare, or else my final files were missing the gene names.
|
![]() |
![]() |
![]() |
#9 |
Member
Location: Boston, MA Join Date: Feb 2010
Posts: 10
|
![]()
Hello again --
I got this response from Geo Pertea, a co-creator and maintainer of Cufflinks at UMD. Short answer: the missing nearest_ref_id field is a feature, not a bug. So this is closed as far as I'm concerned. The problems I'm having with the CuffDiff output are something else, like data. (BTW, I mistakenly said CuffCompare above, but ref fields in Cuffcompare output files like transcripts.gtf.refmap are populated as I said: it's CuffDiff output like genes.fpkm_tracking that omits them.) Howie ============================= There are a couple of known inconsistencies in the way reference *gene* names are parsed and used by cuffdiff from the input GTF file, these will be corrected in the next release. But in this particular case I think what you're seeing is just an artifact caused by the choice of a fixed format for all the *.fpkm_tracking files, and in the genes.fpkm_tracking file both class_code and nearest_ref_id columns are not supposed to be populated -- this is a design choice. The reason is that in that particular file the unit of analysis is the gene *locus*, not the transcript -- and there could be many reference transcripts for that locus, so essentially those 2 columns are just not populated at all in the genes.fpkm_tracking file. There is a note in the manual for the "class_code" column for that already, unfortunately it is not made clear that the same applies to the 3rd column (nearest_ref_id). We could've probably printed a comma delimited list of reference transcript IDs in that column but that would be ugly -- and you already have the isoforms.fpkm_tracking for that, if you really want to track reference transcripts. Now about the second part of your issues with this file -- that statement that "cuffdiff does not aggregate the genes correctly", can you please clarify why do you think that is the case? Again, in this file only the loci (genes) are being tracked (the XLOC_* IDs you see in the 1st column) and while the coordinates in the 6th column may seem weird sometimes (because they actually show the largest enclosing region that overlaps that locus (XLOC_*) region instead of the original locus location, but that's just a display artifact), I can assure you that not showing the nearest_ref_ids in the 3rd column (for the reasons explained above) has nothing to do with the aggregation of transcripts into genes.. Transcripts are simply aggregated into genes by their gene ID - which is the XLOC_* id in this case, but it's really whatever the gene_id attribute has in the input GTF file. --geo |
![]() |
![]() |
![]() |
Thread Tools | |
|
|