![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
What is the meaning density in CummeRbund plots? | Carlos Borroto | Bioinformatics | 16 | 12-29-2016 10:05 PM |
cummeRbund get isoforms | paolo.kunder | Bioinformatics | 6 | 04-23-2014 08:59 AM |
Biological replicates with cuffdiff, cummeRbund | turnersd | Bioinformatics | 15 | 11-19-2012 06:59 AM |
CummeRbund error without CDS | dvanic | Bioinformatics | 5 | 03-12-2012 10:24 AM |
gene name in cufdiif output? | semna | Bioinformatics | 3 | 12-07-2010 04:41 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Senior Member
Location: Charlottesville, VA Join Date: May 2011
Posts: 112
|
![]()
I'm going through the tophat/cufflinks/cuffmerge/cuffdiff/cummeRbund protocol just published at Nature Protocols. I'm wondering how to get a different identifier (e.g. the gene symbol instead of XLOC) in the diffData(genes(cuff)) output for printing tables, annotating plots, etc.
Code:
> cuff <- readCufflinks() > gene_diff_data <- diffData(genes(cuff)) > sig_gene_data <- subset(gene_diff_data, significant=="yes") > head(sig_gene_data) gene_id sample_1 sample_2 status value_1 value_2 ln_fold_change test_stat p_value q_value significant 3 XLOC_000003 C1 C2 OK 48.7791 77.9603 0.676476 -11.81010 0.00000e+00 0.00000e+00 yes 60 XLOC_000060 C1 C2 OK 65.0337 107.4470 0.724360 -39.42460 0.00000e+00 0.00000e+00 yes 81 XLOC_000081 C1 C2 OK 26.8576 23.4374 -0.196521 4.68763 2.76389e-06 1.26818e-04 yes 175 XLOC_000175 C1 C2 OK 2724.4700 2514.0400 -0.115971 7.49516 6.61693e-14 4.70596e-12 yes 237 XLOC_000237 C1 C2 OK 23.3386 19.1596 -0.284646 3.73237 1.89689e-04 6.18840e-03 yes 250 XLOC_000250 C1 C2 OK 24.0661 42.1466 0.808413 -9.23741 0.00000e+00 0.00000e+00 yes > |
![]() |
![]() |
![]() |
#2 |
Member
Location: Cambridge, MA Join Date: Feb 2008
Posts: 82
|
![]()
Hi turnersd,
The workflow for cummeRbund has been simplified a bit since the paper was submitted. The recommended approach to this (for cummeRbund 1.1.3 or greater) is as follows Code:
> cuff <- readCufflinks() #Retrive significant gene IDs (XLOC) with a pre-specified alpha > diffGeneIDs <- getSig(cuff,level="genes",alpha=0.05) #Use returned identifiers to create a CuffGeneSet object with all relevant info for given genes > diffGenes<-getGenes(cuff,diffGeneIDs) #gene_short_name values (and corresponding XLOC_* values) can be retrieved from the CuffGeneSet by using: > featureNames(diffGenes) Cheers, Loyal |
![]() |
![]() |
![]() |
#3 |
Junior Member
Location: San Diego Join Date: Feb 2012
Posts: 9
|
![]()
cummeRbund tells me this:
> diffGeneIDs <-getSig(cuff_data_Input,level="genes",alpha=0.05) Error: could not find function "getSig" Did I do something wrong? |
![]() |
![]() |
![]() |
#4 |
Junior Member
Location: San Diego Join Date: Feb 2012
Posts: 9
|
![]()
OK, got it, I got an older version (0.1.3)....
But where can I find the latest version then? K. Last edited by kareldegendt; 03-25-2012 at 11:45 PM. Reason: new problem |
![]() |
![]() |
![]() |
#5 |
Member
Location: Cambridge, MA Join Date: Feb 2008
Posts: 82
|
![]()
Hi,
you can find the freshest cummeRbund at: http://compbio.mit.edu/cummeRbund/ Also, be sure to sign up for the bowtie-bio-announce mailing list if you would like to be updated to new releases/features. Cheers, Loyal |
![]() |
![]() |
![]() |
#6 |
Senior Member
Location: Rockville, MD Join Date: Jan 2009
Posts: 126
|
![]()
Hi Loyal,
I'm facing the same situation as kareldegendt. Can you provide a set of instructions for upgrading to a newer version when cummerRbund is already on a system (64-bit Linux)? This would be very helpful for R novices like myself. I've tried downloading and unzipping the tarball and using make but that doesn't seem to work. Thanks, Shurjo |
![]() |
![]() |
![]() |
#7 |
Junior Member
Location: San Diego Join Date: Feb 2012
Posts: 9
|
![]()
OK, here's what I did:
I first downloaded the cummeRbund Mac OS X binary (for version 1.1.5) Then I did this in R: > install.packages('/Users/kareldegendt/Downloads/cummeRbund_1.1.5.tgz',repos = NULL) then I added cummeRbund to the current session: >library(cummeRbund) It loaded abunch of things and told me that the package cummeRbund was built under R version 2.15.00 No clue if that's gonna hurt anything. I'll test and if so, I'll probably upgrade R... best, Karel |
![]() |
![]() |
![]() |
#8 |
Junior Member
Location: San Diego Join Date: Feb 2012
Posts: 9
|
![]()
OK, this works BUT:
I still did not get gene symbols (like f.e. Akt or Bact) but the XLOC_000.... and uc00.... names... Not really a solution :-/ K. Last edited by kareldegendt; 03-27-2012 at 02:04 PM. Reason: typo |
![]() |
![]() |
![]() |
#9 |
Senior Member
Location: Rockville, MD Join Date: Jan 2009
Posts: 126
|
![]()
Hi Karel,
Thanks for the tips. They worked for me as well. Regards, Shurjo |
![]() |
![]() |
![]() |
#10 | |
Senior Member
Location: University of Southern Denmark (SDU), Denmark Join Date: Apr 2009
Posts: 105
|
![]()
You could do it with merge:
Quote:
|
|
![]() |
![]() |
![]() |
#11 |
Member
Location: Cambridge, MA Join Date: Feb 2008
Posts: 82
|
![]()
Hi All,
Sorry I missed the earlier posts in this thread. Sorry for the troubles in getting updates to cummeRbund installed, but it appears that you were both successful. The reason for this setup is that a 'stable' version of cummeRbund (v1.0.0) has to be maintained with the current 'release' version of Bioconductor. Active development of new features (including getSig, etc) is done on the 'development' version of Bioconductor which is attached to the 'development' release of R (currently v2.15). When the new version of Bioconductor is released (in the next few weeks), all of the development features of cummeRbund will be available using the standard BioC install methods. The benefit of using this is obviously earlier access to newer features but the drawback is of course a moderate amount of instability and growing pains. You can also install the development version of R and the most recent version of cummeRbund will be installed by BiocLite() by default. The way I am trying to write cummeRbund, the 'development' versions should also be compatible with earlier versions of R (at least 2.13 or greater). To answer the question of gene names directly, they can always be accessed as part of the 'features' data.frame returned from a call to features() on a CuffData or CuffGeneSet object. The 'featureNames()' function is just a shorthand that in most cases just returns the gene_id and gene_short_names (when present) only. Another common way to represent the FPKM data is in a 'matrix' format of featuresXconditions. You can use fpkmMatrix(myGeneSet,fullnames=T) to generate this matrix. The general problem with using gene names is that they are inherently non-unique (despite efforts to enforce this). This causes significant problems for a lot of the behind-the-scenes data wrangling in both cufflinks/cuffdiff and cummeRbund. This is why the XLOC_* and TCONS_* ids are essential to track individual features. Our suggestion, as mentioned above, is to use the 'features()' method to get all annotation associated with features in a CuffData, CuffGeneSet, or CuffGene object. The output of this method should be a standard R data.frame on which you can do any manipulations/merges that you would like. Please let me know if you have specific workflows in which you are having difficulty mapping these ids to gene names and I can help with the syntax. Cheers! Loyal |
![]() |
![]() |
![]() |
#12 |
Member
Location: Cambridge, MA Join Date: Feb 2008
Posts: 82
|
![]() |
![]() |
![]() |
![]() |
#13 |
Junior Member
Location: sydney Join Date: Mar 2012
Posts: 4
|
![]()
Hi everyone,
very helpful responses so far! Retrieving the gene names works great for me, but unfortunately just for differentially expressed genes. If I want to look at splicing or isoforms I cannot get it to work. I just get a list of identifiers or, if I try with the above path and use "Isoforms" or "Splicing" instead of "Genes" the list is empty. I am very new to all these things so not sure what I am doing wrong or what I have to do to determine the significantly differentially expressed isoforms, splicing, promoters,...and get a list including the gene names. Will be very greatful for any help! Cheers, K |
![]() |
![]() |
![]() |
#14 |
Junior Member
Location: San Diego Join Date: Feb 2012
Posts: 9
|
![]()
Hi all,
It finally worked for me too. I re-ran tophat and cufflinks with the refFlat file as an annotation file and that did the trick :-) Thanks for all your help!! Karel |
![]() |
![]() |
![]() |
#15 | |
Member
Location: Cambridge, MA Join Date: Feb 2008
Posts: 82
|
![]() Quote:
This is something that I'm actively working on including in a future release of cummeRbund. Right now, it's very easy to retrieve all information for a particular gene, however, several people have asked for a 'getFeatures()' method (similar to getGenes()) that would retrieve just the information you are looking for. I will try to post to this thread when I have it working, but also, please keep checking the website for an update. As an alternative, you can get the significantly different isoforms list by using getSig (level='isoforms'). And you can retrieve ALL isoform annotation by using 'features(isoforms(cuff))' and/or 'fpkm(isoforms(cuff))'. You should be able to filter those data.frames using the list generated from the call to getSig(). Cheers, Loyal |
|
![]() |
![]() |
![]() |
#16 |
Member
Location: Cambridge, MA Join Date: Feb 2008
Posts: 82
|
![]()
Kittykat22,
e.g. Code:
library(cummeRbund) cuff<-readCufflinks() allIsoformFPKMs<-fpkm(isoforms(cuff)) allIsoformFeatures<-features(isoforms(cuff)) sigIsoforms<-getSig(cuff,level='isoforms') sigIsoformFeatures<-allIsoformFeatures[allIsoformFeatures$isoform_id %in% sigIsoforms,] sigIsoformFPKMs<-allIsoformFPKMs[allIsoformFPKMs$isoform_id %in% sigIsoforms,] Loyal |
![]() |
![]() |
![]() |
#17 |
Junior Member
Location: sydney Join Date: Mar 2012
Posts: 4
|
![]()
Hi Loyal,
thanks so much for your help and your quick response! I can get the gene identifier now and even though I havent figured out how to include the gene_short_name, Im sure Ill get there! K |
![]() |
![]() |
![]() |
#18 | |
Junior Member
Location: Glasgow, Scotland Join Date: Mar 2012
Posts: 6
|
![]()
Hi All,
I know that Cuffdiff does pairwise differential expression testing, however I have more than two groups A, B ... F. I am not interested in all possible group pairings, just: A vs. B, C vs. D and E vs. F. Follow the protocol published in nature my code would be: >gene.diff.data <-diffData(genes(cuff_data), 'A','B') >sig.gene.data <-subset(gene.diff.data, (significant=='yes')) >nrow(sig.gene.data) >write.table(sig.gene.data, 'diff_genes_AvsB.txt', sep= '\t', row.names = F, col.names =T, quote = F) However, I dont get the "gene_short_name" in my final table. Then follow the discussion in this post, I'd start with: >diffGeneIDs <- getSig(cuff,level='genes','A','B',alpha=0.05) I followed the code posted here but after: >row.names(diffGenesData)=diffGenesData$gene_id I get: Quote:
Thanks |
|
![]() |
![]() |
![]() |
#19 | |
Senior Member
Location: Baltimore Join Date: Mar 2012
Posts: 120
|
![]() Quote:
Assuming, I can get this fixed and merge the data using Thomas' code, would my next step be: write.table(diffGenesOutput, 'name.txt', sep= '\t', row.names = F, col.names =T, quote = F) With diffGenesOutput being the merged file that was created with Thomas' code. I am also very clearly a newbie |
|
![]() |
![]() |
![]() |
#20 |
Senior Member
Location: Baltimore Join Date: Mar 2012
Posts: 120
|
![]()
OK, so the error was because of the duplicate IDs, so I guess I may need a unique command or something.
But I just went around this whole thing the long way, and from Loyal's suggestion, I used featureNames. So now I have two tables, one that has all the information like the Xloc ID, fold change, Pvalue, etc., and other that just has the Xloc ID and the tracking ID. So I can merge these together in access, but what are these tracking IDs? How do I go from these to the names of genes? |
![]() |
![]() |
![]() |
Tags |
cuffdiff, cufflinks, cuffmerge, cummerbund, rna-seq |
Thread Tools | |
|
|