Fairly new to RNA-Seq analysis here. We have assembled novel transcriptome using 454 and Illumina reads via Newbler2.5. Now we wish to use cufflinks package (in particular cuffdiff) to measure expression changes. It seems the cufflinks package is geared towards aligning reads to genomic reference, where cufflinks will output the transcript.gtf file after aligning reads to a genome via tophat. If our dataset is purely mRNA, do I still need to run cufflinks to get the transcript.gtf file, which is required as input to Cuffdiff?
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Cuffdiff is only meant for RNA-Seq analysis where a reference genome is available. You can probably make it work for you by altering your alignment file to give your transcripts "dummy" genomic coordinates. Alternatively, I recommend you look at RSEM (http://deweylab.biostat.wisc.edu/rsem/) which is meant to estimate abundances purely from transcript sequences.
-
I just re-read your question, and it's not clear whether or not a reference genome is available. If it is, you can map your reads to the reference using your assembled transcriptome (GTF) as an input to TopHat. You can then use Cuffdiff with your GTF without first running Cufflinks, if you do not wish to use its assembly function. Alternatively, you can use the Cufflinks assembler and Cuffcompare to combine the Cufflinks assembly with your own.
Let me know if you have any further questions!
Comment
-
Yes, but it will probably be somewhat involved. You will need to build your GTF so that isoforms correctly overlap in the genome coordinates. You will then need to adjust your SAM alignments so that they are converted from transcriptome coordinates to your new "genomic" coordinates. If you decide to go this route, please let me know how it goes! I can also help if you get stuck along the way.
Comment
-
Could I just use the "-G" option in cufflinks and use a reference annotation based on my assembled transcriptome? Also, can I directly use the transcript.gtf produced from cufflinks as cuffdiff input? I read that cuffdiff requires tss and p_id's, but it looks like cufflinks does not produce these IDs in the outputted gtf file.
Comment
-
p_id issue
I think Cole or Adam may like to add a few sentences in the mannual about getting p-id, I have used GTF file with CDS record but no p_id in the out put in most recent version of Cufflinks. It is quite a problem and not straight forward otherwise It is a great tool.
Comment
-
I have been testing an approach where I use Tophat to align reads back to contigs (treating the like many many small chromosomes in essence) and then using Cufflinks-compare-diff to look at differential expression.. This is in a novel transcriptome i.e. no reference. Seems to work OK enough-- although there may be some issue with the statistical detection of differneces that I have yet to explore.
Comment
-
Thats definitely of interest to us- I'd really like feedback on how well this works and any suggestions you may have on optimizing such a strategy.
Lior ([email protected])
Comment
-
I realize there are other programs such as RSEM that easily handle novel transcriptomes, however the CuffDiff package produces more detailed output. We used Newbler to assemble the transcriptome, and output for this program is in the form of isogroups, isotigs, and contigs, where an isogroup is considered a gene, isotig is transcript, and contigs are exons. Ideally this would be true for all the output, but is not the case. Many isotigs within a isogroup are simply assembly goofs, retrotransposon events, or indel genes from sister chromosomes.
An idea above suggested treating each gene as is if it was on its own separate chromosome. It should make no difference to the program if it thinks there are 10 or 1000000 chromosomes. This might make it easier to create a fake transcript.gtf file, where gene_id's would be the isogroup number, and transcript_id's are the isotig names. Then possibly treating the whole isotig as a CDS region?? With published results showing how Newbler2.5 is such a great assembler, there has to be someone who has assembled novel transcriptome via Newbler and mapped reads through TopHat/Cufflinks? Anyone??? lol
Comment
-
I apologize for giving you inaccurate information earlier, but it turns out that my idea for making a pseudo-genome will not work unless your transcriptome is aligned so that you know which transcripts overlap and where. Otherwise, you will need multi-read support, which Cufflinks currently lacks (although it is coming).
Short of knowing these overlaps, RSEM is your best option (to my knowledge).
Comment
-
I forgot to mention that our transcriptome is plant. That means polyploid and littered with retrotransposons/transposons. The RSEM paper said 52% of reads in maize were multi-reads. I'm not sure if even RSEM can handle the amount of multi-read alignments a novel plant transcriptome will produce. We are still brainstorming ways to shortcut and streamline this pipeline. Will removing any transcripts & reads that map to retrotransposons affect the statistics of RSEM or CuffLinks? I have been favoring CuffLinks because of better documentation and output. The RSEM google group doesn't have any posts yet. Has anyone used RSEM for novel transcriptome?
Comment
Latest Articles
Collapse
-
by seqadmin
Spatial biology is an exciting field that encompasses a wide range of techniques and technologies aimed at mapping the organization and interactions of various biomolecules in their native environments. As this area of research progresses, new tools and methodologies are being introduced, accompanied by efforts to establish benchmarking standards and drive technological innovation.
3D Genomics
While spatial biology often involves studying proteins and RNAs in their...-
Channel: Articles
01-01-2025, 07:30 PM -
-
by seqadmin
Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...-
Channel: Articles
12-16-2024, 07:57 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 01-09-2025, 04:04 PM
|
0 responses
433 views
0 likes
|
Last Post
by seqadmin
01-09-2025, 04:04 PM
|
||
Started by seqadmin, 01-09-2025, 09:42 AM
|
0 responses
441 views
0 likes
|
Last Post
by seqadmin
01-09-2025, 09:42 AM
|
||
Started by seqadmin, 01-08-2025, 03:17 PM
|
0 responses
456 views
0 likes
|
Last Post
by seqadmin
01-08-2025, 03:17 PM
|
||
Started by seqadmin, 01-03-2025, 11:18 AM
|
1 response
50 views
1 like
|
Last Post
by Tonia
01-05-2025, 12:15 PM
|
Comment