![]() |
|
|||||||
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Calculate transcript abundance for prokaryotes | faozhi | RNA Sequencing | 0 | 12-05-2011 10:04 PM |
| RNA-Seq: ISOFORM ABUNDANCE INFERENCE PROVIDES A MORE ACCURATE ESTIMATION OF GENE EXPR | Newsbot! | Literature Watch | 0 | 12-15-2010 02:00 AM |
| visualization of transcript abundance | mrfox | Bioinformatics | 0 | 11-29-2010 07:58 AM |
| Tophat/Cufflinks newbie - question about transcript assembly | internet_nobody | Bioinformatics | 4 | 08-19-2010 09:03 AM |
| PubMed: Transcript assembly and quantification by RNA-Seq reveals unannotated transcr | Newsbot! | Literature Watch | 2 | 06-11-2010 06:56 AM |
![]() |
|
|
Thread Tools |
|
|
#1 |
|
Senior Member
Location: Boston, MA Join Date: Nov 2008
Posts: 212
|
Hi all,
I'm pleased to let you know that I've recently released a new tool for analyzing RNA-Seq data that you may find useful. The program, Cufflinks, is designed to assemble the transcripts in an RNA-Seq sample and calculate their abundances (in transcript-level RPKM). Cufflinks is built to work without a reference annotation. You need only a reference genome and an input file of RNA-Seq read alignments (in SAM format), such as the one produced by TopHat. Cufflinks currently supports only Mac OS X and Linux, and you'll need a medium-sized machine to run it. The program is fast and fully threaded, but for large (>100 million aligned reads) runs, the memory footprint can be 10GB or more. This is only the first release, and there are numerous features planned for the next weeks and months. The project is a collaboration between scientists at the University of Maryland, UC Berkeley, Caltech, and WashU. We would be grateful for any feedback, suggestions, bug reports, or criticism you have. We hope you find Cufflinks useful. |
|
|
|
|
|
#2 |
|
Member
Location: Bay Area Join Date: Jan 2009
Posts: 58
|
Hi Cole
Thanks for the update. I took a quick glance and just wondering if there are some methods available to visualize the transcript abundances reported in the GTF format. Also any way to simultaneously compare control and sample. -Abhi |
|
|
|
|
|
#3 |
|
Senior Member
Location: Boston, MA Join Date: Nov 2008
Posts: 212
|
Hi Abhi,
We recommend using cuffcompare to compare control and sample, though you may need to tailor significance tests, etc. to your analysis. You might want to check out this paper for more details on control and sample testing with RPKM. As for visualization, we don't have anything out for that yet, as supporting a UI on various platforms, etc. is more engineering resources than we can really afford right now. However, we've tried to make the various output files as friendly as possible to environments like R, so that you can use your plotting environment of choice. If anyone were to contribute scripts or packages that depended on standard plotting tools, we would happily include them in the Cufflinks distribution (properly credited, of course). |
|
|
|
|
|
#4 |
|
Senior Member
Location: Boston area Join Date: Nov 2007
Posts: 680
|
A tip of the cap to continuing the classy nomenclature theme!
|
|
|
|
|
|
#5 |
|
Junior Member
Location: Rockville, MD Join Date: Oct 2009
Posts: 1
|
Hi apratap,
I've been working on visualizing the transcript abundance data in GBrowse and have had some luck with it. Here is how I went about it, 1. Loading the Cufflinks identified pseudo-transcripts is very easy (GBrowse seems to accept the GTF as input and is able to display the transcripts) 2. To display the abundance data, you need to perform the following transformation on the GTF file and create a GFF file for use with GBrowse. As an example, here are a few lines from the GTF file Code:
124 Cufflinks transcript 1551 1663 1000 . . gene_id "CUFF.1"; transcript_id "CUFF.1.0"; RPKM "268.2781599177"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0. 000000"; cov "0.637168"; 124 Cufflinks exon 1551 1663 1000 . . gene_id "CUFF.1"; transcript_id "CUFF.1.0"; exon_number "1"; RPKM "268.2781599177"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.637168"; 124 Cufflinks transcript 8604 9455 1000 . . gene_id "CUFF.4"; transcript_id "CUFF.4.0"; RPKM "1405.4689751085"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0. 000000"; cov "3.338028"; 124 Cufflinks exon 8604 9455 1000 . . gene_id "CUFF.4"; transcript_id "CUFF.4.0"; exon_number "1"; RPKM "1405.4689751085"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "3.338028"; 124 Cufflinks transcript 9540 9757 1000 . . gene_id "CUFF.7"; transcript_id "CUFF.7.0"; RPKM "903.9004975207"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0. 000000"; cov "2.146789"; 124 Cufflinks exon 9540 9757 1000 . . gene_id "CUFF.7"; transcript_id "CUFF.7.0"; exon_number "1"; RPKM "903.9004975207"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "2.146789"; 124 Cufflinks transcript 9825 12624 1000 . . gene_id "CUFF.10"; transcript_id "CUFF.10.0"; RPKM "1082.6940025248"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0. 000000"; cov "2.571429"; 124 Cufflinks exon 9825 12624 1000 . . gene_id "CUFF.10"; transcript_id "CUFF.10.0"; exon_number "1"; RPKM "1082.6940025248"; frac "1.000000"; conf_lo "0. 000000"; conf_hi "0.000000"; cov "2.571429"; 124 Cufflinks transcript 12678 13027 1000 . . gene_id "CUFF.13"; transcript_id "CUFF.13.0"; RPKM "1775.6181641407"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0. 000000"; cov "4.217143"; 124 Cufflinks exon 12678 13027 1000 . . gene_id "CUFF.13"; transcript_id "CUFF.13.0"; exon_number "1"; RPKM "1775.6181641407"; frac "1.000000"; conf_lo "0. 000000"; conf_hi "0.000000"; cov "4.217143"; 124 Cufflinks transcript 14562 14603 1000 . . gene_id "CUFF.16"; transcript_id "CUFF.16.0"; RPKM "721.7960016832"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0. 000000"; cov "1.714286"; 124 Cufflinks exon 14562 14603 1000 . . gene_id "CUFF.16"; transcript_id "CUFF.16.0"; exon_number "1"; RPKM "721.7960016832"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "1.714286"; 124 Cufflinks transcript 17793 17835 1000 . . gene_id "CUFF.19"; transcript_id "CUFF.19.0"; RPKM "705.0100481557"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0. 000000"; cov "1.674419"; 124 Cufflinks exon 17793 17835 1000 . . gene_id "CUFF.19"; transcript_id "CUFF.19.0"; exon_number "1"; RPKM "705.0100481557"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "1.674419"; 124 Cufflinks transcript 27148 27726 1000 . . gene_id "CUFF.22"; transcript_id "CUFF.22.0"; RPKM "2853.5251258254"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0. 000000"; cov "6.777202"; 124 Cufflinks exon 27148 27726 1000 . . gene_id "CUFF.22"; transcript_id "CUFF.22.0"; exon_number "1"; RPKM "2853.5251258254"; frac "1.000000"; conf_lo "0. 000000"; conf_hi "0.000000"; cov "6.777202"; 124 Cufflinks transcript 27829 29109 1000 . . gene_id "CUFF.25"; transcript_id "CUFF.25.0"; RPKM "3585.3145657380"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0. 000000"; cov "8.515222"; 124 Cufflinks exon 27829 29109 1000 . . gene_id "CUFF.25"; transcript_id "CUFF.25.0"; exon_number "1"; RPKM "3585.3145657380"; frac "1.000000"; conf_lo "0. 000000"; conf_hi "0.000000"; cov "8.515222"; 124 Cufflinks transcript 29171 34842 1000 . . gene_id "CUFF.28"; transcript_id "CUFF.28.0"; RPKM "2442.5515614083"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0. 000000"; cov "5.801128"; 124 Cufflinks exon 29171 34842 1000 . . gene_id "CUFF.28"; transcript_id "CUFF.28.0"; exon_number "1"; RPKM "2442.5515614083"; frac "1.000000"; conf_lo "0. 000000"; conf_hi "0.000000"; cov "5.801128"; 124 Cufflinks transcript 34945 36257 1000 . . gene_id "CUFF.31"; transcript_id "CUFF.31.0"; RPKM "3751.9099097396"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0. 000000"; cov "8.910891"; 124 Cufflinks exon 34945 36257 1000 . . gene_id "CUFF.31"; transcript_id "CUFF.31.0"; exon_number "1"; RPKM "3751.9099097396"; frac "1.000000"; conf_lo "0. 000000"; conf_hi "0.000000"; cov "8.910891"; 124 Cufflinks transcript 37803 38013 1000 . . gene_id "CUFF.34"; transcript_id "CUFF.34.0"; RPKM "1867.7754356353"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0. 000000"; cov "4.436019"; 124 Cufflinks exon 37803 38013 1000 . . gene_id "CUFF.34"; transcript_id "CUFF.34.0"; exon_number "1"; RPKM "1867.7754356353"; frac "1.000000"; conf_lo "0. 000000"; conf_hi "0.000000"; cov "4.436019"; Code:
124 Cufflinks abundance 1551 1663 268.2781599177 . . ID=Cufflinks_transcript_abundance 124 Cufflinks abundance 8604 9455 1405.4689751085 . . ID=Cufflinks_transcript_abundance 124 Cufflinks abundance 9540 9757 903.9004975207 . . ID=Cufflinks_transcript_abundance 124 Cufflinks abundance 9825 12624 1082.6940025248 . . ID=Cufflinks_transcript_abundance 124 Cufflinks abundance 12678 13027 1775.6181641407 . . ID=Cufflinks_transcript_abundance 124 Cufflinks abundance 14562 14603 721.7960016832 . . ID=Cufflinks_transcript_abundance 124 Cufflinks abundance 17793 17835 705.0100481557 . . ID=Cufflinks_transcript_abundance 124 Cufflinks abundance 27148 27726 2853.5251258254 . . ID=Cufflinks_transcript_abundance 124 Cufflinks abundance 27829 29109 3585.3145657380 . . ID=Cufflinks_transcript_abundance 124 Cufflinks abundance 29171 34842 2442.5515614083 . . ID=Cufflinks_transcript_abundance 124 Cufflinks abundance 34945 36257 3751.9099097396 . . ID=Cufflinks_transcript_abundance 124 Cufflinks abundance 37803 38013 1867.7754356353 . . ID=Cufflinks_transcript_abundance 124 Cufflinks abundance 38163 38302 6604.4334154015 . . ID=Cufflinks_transcript_abundance Code:
[abundance] feature = abundance category = Cufflinks_abundance glyph = xyplot graph_type = boxes fgcolor = brown bgcolor = orange scale = left min_score = 0 height = 50 group_on = display_name ![]() Hope this helps! Best, ~Vivek Last edited by vkrishna; 10-25-2009 at 04:10 PM. |
|
|
|
|
|
#6 |
|
Member
Location: Bay Area Join Date: Jan 2009
Posts: 58
|
Hey vkrishna
Thanks a lot for sharing your method. I will try to work on it sometime later this week. Best, -Abhi |
|
|
|
|
|
#7 |
|
Junior Member
Location: Berlin, Germany Join Date: Oct 2009
Posts: 5
|
In our group we have just noticed that proximal genes get merged by cufflinks.
While this is probably unavoidable for a run without any annotation it is clearly wrong in case the correct annotation is provided and ignored/overridden. Here's one example from the mitochondrial chromosome (kindly provided by arthur.yxt): cufflinks genes.expr file says: Code:
ENSG00000198899 713201 chrM 8527 9988 5618.54 Code:
chrM . exon 8528 9208 . + . gene_id "ENSG00000198899";transcript_id "ENST00000361899" chrM . exon 9208 9988 . + . gene_id "ENSG00000198938";transcript_id "ENST00000362079" If needed, a small dataset to reproduce this bug can be provided. Best! -Marvin |
|
|
|
|
|
#8 | |
|
Senior Member
Location: Boston, MA Join Date: Nov 2008
Posts: 212
|
Quote:
1) The reference-based quantitation is only reporting one gene in this example. This is probably a bug, and I would greatly appreciate a small test set via email 2) The assembly algorithm merges overlapping genes. In strandless prototocols, merging overlapping, unspliced genes or overlapping genes on the same strand is tough to avoid as you suggest (although tracking polyA reads may help here). However, in my experience these are extremely rare in most organisms that have splicing. Yeast is a notable exception. The mitochondrial genome is one place I *would* expect it, given the gene density, lack of splicing, and generally very high read count (with associated polymerase run-on, etc.). The example you gave, if I understand correctly, demonstrates 1). Are you also seeing 2)? And If so, can you give us a sense of how often? Fighting this problem via polyA tracking and local assembly is high on the list of planned features, but if users are running into this a lot, I could try to get to it sooner rather than later. |
|
|
|
|
|
|
#9 |
|
Junior Member
Location: Oslo Join Date: Oct 2009
Posts: 1
|
Removed, found answer somewhere else in the forum.
Last edited by gard; 10-29-2009 at 07:31 AM. |
|
|
|
|
|
#10 |
|
Senior Member
Location: Salk Institute, San Diego, CA, USA Join Date: Sep 2009
Posts: 193
|
This is probably a simple question but I just don't understand the computation of RPKM well enough to answer it for myself.
I've run cufflinks on tophat output (both run without a reference annotation) and then finally cuffcompare with a reference annotation to map the cufflinks transcripts to the UCSC transcripts. My question is this: in the stdout.tracking file that's produced from cuffcompare I'll find several cufflinks determined transcripts that have been mapped to a single UCSC transcript. Each of those cufflinks transcripts has an RPKM - how do I translate those RPKMs to the single UCSC transcript that cuffcompare has mapped them to? example lines from my tracking file: Code:
uc008bzg.1|uc008bzg.1 e q1:CUFF.209682|CUFF.209682.0|100|10.337529|0.000000|0.000000|uniq uc008bzg.1|uc008bzg.1 j q1:CUFF.209785|CUFF.209785.0|100|259.417274|0.000000|0.000000|uniq uc008bzg.1|uc008bzg.1 c q1:CUFF.209691|CUFF.209691.0|100|34.891979|0.000000|0.000000|uniq uc008bzg.1|uc008bzg.1 c q1:CUFF.209695|CUFF.209695.0|100|135.598390|0.000000|0.000000|uniq uc008bzg.1|uc008bzg.1 c q1:CUFF.209701|CUFF.209701.0|100|133.368072|0.000000|0.000000|uniq |
|
|
|
|
|
#11 |
|
Senior Member
Location: Salk Institute, San Diego, CA, USA Join Date: Sep 2009
Posts: 193
|
OK, I've thought about this and it's likely that the way to pull off what I want is to consider it a problem of merging multiple things made up of the same materials (in this case bp of exon) all with different densities (RPKMs). That breaks it down to calculating a weighted mean. Of course I'll need total bp worth of exon from each of those transcripts but with that info it should be an easy calculation.
|
|
|
|
|
|
#12 |
|
Senior Member
Location: Boston, MA Join Date: Nov 2008
Posts: 212
|
I've had decent success doing that, by performing a transfrag length-weighted average of the RPKM.
|
|
|
|
|
|
#13 |
|
Member
Location: Bay Area Join Date: Jan 2009
Posts: 58
|
Hi Cole and Driscoll
Good that you have asked this before I actually realized. So I am a bit confused now. Why does cufflinks for the same sample, maps diff cufflinks predicted transcripts to same UCSC transcript. I think I am missing some biology here but may be this question is not trivial. It will me understand better if you guys could explain a bit more. -Abhi |
|
|
|
|
|
#14 |
|
Senior Member
Location: Boston, MA Join Date: Nov 2008
Posts: 212
|
After normalizing for transcript length, the fraction of the total reads in your experiment that come from any one transcript is roughly proportional to the abundance of that transcript.
What we're seeing here is a transcript that is rare enough in the sample that it doesn't get many reads, and thus has gaps in sequencing coverage. When this happens, Cufflinks can't assemble the full thing. This is one of the central differences between Cufflinks and say, an ab initio gene finder like Augustus. So what happens is that Cufflinks assembles the reads that came from this rare transcript as best it can, and the result is a handful of "transfrags", each of which get their own RPKM estimate. What sdriscoll is trying to do here is estimate the RPKM of the whole transcript from the individual fragment RPKMs. |
|
|
|
|
|
#15 |
|
Member
Location: Helsinki Join Date: Feb 2009
Posts: 11
|
Hi Cole,
does Cufflinks take as input the 'raw' SAM file produced by Bowtie (>version 0.11.1)? Daniel |
|
|
|
|
|
#16 | |
|
Senior Member
Location: Newcastle, Australia Join Date: Oct 2009
Posts: 307
|
Quote:
The SAM file supplied to Cufflinks must be sorted by reference position. If you aligned your reads with TopHat, your alignments will be properly sorted already. If you used another tool, you may want to make sure they are properly sorted as follows: sort -k 3,3 -k 4,4n hits.sam > hits.sam.sorted And you may also have to filter the reads not mapped to the reference genome, marked by "*" in the field of RNAME.
__________________
Xi Wang |
|
|
|
|
|
|
#17 |
|
Member
Location: Vienna Join Date: Aug 2008
Posts: 11
|
Hi guys,
I have now tried it several times but I cant seem to get cufflinks running with my new 3x paired end lanes, I always get a 'bus error' since I always get the error at slightly different positions Processing bundle [ 3:9131617-9139931 ] with 8255 non-redundant alignments zsh: bus error cufflinks -m 85 -I 50000 -p 4 -L VPG Processing bundle [ 3:9411509-9430092 ] with 11575 non-redundant alignments Bus error I suspect a multithreading error. Has anyone encountered a similar problem or even better knows how to deal with this problem? Furthermore I frequently receive strange warnings: Warning: restimation failed, importance samples have zero weight Warning: ITERMAX reached in abundance estimation, estimation hasn't fully converged Warning: Fisher matrix is not positive definite (bad element: -2.99551e-08) ![]() Does anyone know if they are critical or what do they mean? To provide some additional information for the bus error. I have already successfully run cufflinks on single lane paired end data. but when I merge 3 lanes of paired end data the 'bus error' occurs. I mapped the reads with TopHat and merging was done with the following command cat s1/sample1.sam s2/sample2.sam s3/sample3.sam| sort -k 3,3 -k 4,4n >all/all.sam I would greatly appreciate any comments and or help on this issues! all the best ro |
|
|
|
|
|
#18 |
|
Junior Member
Location: Columbia, MO, USA Join Date: Sep 2008
Posts: 1
|
When cufflinks version 0.8.1 was released on Feb. 13, 2010, had it been tested using the boost Version 1.42.0, which was released on Feb. 2, 2010? I am having trouble getting cufflinks to compile because of some problem in finding all that it needs in boost, and the file structure of boost had changed a bit with the new version. So far using CFLAGS and the appropriate path to the boost/property_map includes is not helping and I may just try an earlier version of boost. Has anybody had this problem?
Thanks Bill |
|
|
|
|
|
#19 |
|
Junior Member
Location: Shenzhen Join Date: Mar 2010
Posts: 2
|
Hi Cole,
Is there any memory controlling methods in the newly released version 0.8.2? I've got a big memory problem when processing an RNA-Seq mapping result, which contains a really huge bundle with more than 2,000,000 reads. Even a server with 32GB mem could not finish this. Can you help me ? Regards, Bob |
|
|
|
|
|
#20 |
|
Member
Location: DC Metro Join Date: Aug 2009
Posts: 27
|
I have completed the cufflinks, cuffcompare and cuffdiff runs and have the results. However, how do I go from the cufflinks IDs to the reference gene symbol? I did not give the gff file as input when I ran. Is there any way other than re-running everything with gff file?
Thanks, |
|
|
|
![]() |
| Thread Tools | |
|
|