Seqanswers Leaderboard Ad

**tboothby** · 01-23-2012, 09:20 AM

Hi Mike,
I am currently trying to do the same thing (use a de novo transcriptome as a reference for Tophat and cufflinks). I am not sure about your 'N' problem.

With your header issue have you tried converting you bam files to sam files? I believe that cufflinks does not have a size limitation for sam headers. I tried that approach and it seemed to work the same as when I changed cufflinks max header size for bam files (which you said you can't do).

I have found some issues though with my cufflinks/cuffdiff output. It seems like some sequences (present in the reference and with good tophat coverage) do not get assembled by cufflinks. Thus, confirmed sequences are missing in downstream analysis (see: http://seqanswers.com/forums/showthread.php?t=17005).

If you find the same is true in your analysis please let us know. I asked about this via the cufflinks technical help email and the response was basically that cufflinks wasn't designed to do this, so they are not sure whats going wrong or how to fix it.

I don't really understand why it wouldn't work, but something appears to be amiss.

**westerman** · 01-23-2012, 09:47 AM

An interesting question and one that I have not done myself, at least with Trinity/Tophat/Cufflinks. I do wonder why you have so many transcripts? 450K seems like a lot for a true transcriptome. Granted Trinity tends to produce a lot of contigs but, still, this seems to be 3-4 times greater than I would expect.

The Trinity FAQ has some suggested downstream analyses which do not include Tophat/Cufflinks. I suspect that Tophat is not the tool you should use since, as far as I know, Tophat is meant for genome analysis. However if you do insist on using Tophat then I would separate the transcripts in your scaffolds by at least a read length (in your case over 100 Ns). I am not positive that Tophat will need this separation but other tools that I have used for this type of get confused if they can map a read across a short interval of Ns.

http://trinityrnaseq.sourceforge.net/#Downstream_analyses

In any case, good luck. It is always interesting working with unknown species.

**westerman** · 01-23-2012, 09:52 AM

I see that tboothby responded while I was writing my response. I wish to quote what I consider to be tboothby's pertinent point ...

Originally posted by tboothby View Post

... the response was basically that cufflinks wasn't designed to do this...

I am not sure if the response from the cufflinks group was meant to say that cufflinks is not designed to handle denovo-transcriptome mapping or perhaps that it is not designed for the specific problem that tboothby found. In any case I do suggest that you explore other tools aside from tophap/cufflinks.

**tboothby** · 01-23-2012, 01:02 PM

@Mike
As Westerman points out the Trinity website now has some basic procedures for aligning read fragments to a Trinity transcriptome.

They suggest using bowtie to align read fragments and then use the bam output for quantification using RSEM. I have tried this approach and it seems to work pretty well for quantifying expression.

Question:
I like the ability to align read fragments to a Trinity transcriptome, but can anyone suggest software for getting actual transcripts from those mapped reads?

**mikecz** · 01-24-2012, 11:57 AM

Update:

I redid the scaffolding so that my patches of N's were 120bp long (20bp longer than my reads) instead of 10 and it seems to have mapped great. I'm still sifting through the results to make sure it didn't introduce any unanticipated problems, but so far it looks the same as if I had mapped to a genome (minus introns).

Thanks for pointing me toward the downstream tools for working with Trinity output, they look great and I'm working on implementing them as a comparison to the tuxedo results.

**tboothby** · 01-24-2012, 01:44 PM

Mike,
Have you tried using cufflinks to assemble transcripts with your mapped reads yet?

If so, do you see any instances where transcripts from your de novo transcriptome have good mapping coverage but are not assembled by cufflinks?

**mikecz** · 02-01-2012, 07:48 AM

tbooth,

I have just looked into this and you're right. There are things with very good coverage that are completely missing from the cufflinks output. It almost seems like the transcripts with the best coverage may even be excluded.

Any ideas on why this is happening?

**tboothby** · 02-01-2012, 09:15 AM

My initial thought is that abundant transcripts are generating a lot of sequence reads (obviously) and that the de novo assembler is making many (potentially erroneous) isoforms for those transcripts.

The reads are being mapped between multiple isoforms (or maybe other transcripts with similar conserved domains) and this is leading to good coverage but bad cufflinks assembly. Cufflinks splits 'counts' for mapped reads between multi-mapped transcripts.

We are looking into ways of compressing these isoforms into unigenes. We will test to see if this helps reduce the number of multi-mapped reads and helps with cufflinks assembly.

If you have other ideas about how/why this is happening or how to fix/work around it feel free to share.

**bharat_iyengar** · 12-22-2012, 05:49 AM

can I skip Tophat???

I want to quantify Refseq RNA based on RNAseq data and I am using bowtie-tophat-cufflinks algorithm for this. I have a doubt regarding the necessity of tophat.

If I have an index of transcriptome (human refesq) then can I skip tophat (i dont have an intention of discovering new transcripts).

There is no problem of exon junctions because I am mapping it to the transcriptome. I save time by skipping two steps: tophat and getting annotations (if i were to align against genome). Also, genome index is a bigger file.

I generate a sam alignment file from bowtie and pass it to cufflinks.

I am curious whether this can be done or not. Most people I see use tophat nonetheless. Is it just a habit or a necessity?

**kmcarr** · 12-22-2012, 06:54 AM

Originally posted by bharat_iyengar View Post

I want to quantify Refseq RNA based on RNAseq data and I am using bowtie-tophat-cufflinks algorithm for this. I have a doubt regarding the necessity of tophat.

...

I am curious whether this can be done or not. Most people I see use tophat nonetheless. Is it just a habit or a necessity?

Most people use TopHat beacause that is the right tool for the job. When provided with a genome reference and annotation file TopHat will first align full reads to transcrpts and then split reads to the genome. You can tell TopHat not to search for new exons/junctions if you are not interested in that.

Many very smart people have tought long and hard about the best ways to properly analyze RNA-Seq data and the overwhelming consensus is that if you have a reference genome, especially in a model organism, align to the full genome with the annotion provided.

**dietmar13** · 12-22-2012, 03:31 PM

RUM + htseq-count + samseq (if you have many biological replicates) else limma

in my hands RUM + htseq-count + samseq (samR) gave the best results (most spliced reads mapped and most significant called genes)...

I compared to tophat or STAR + htseq-count + DESeq, edgeR, BaySeq, NoiSeq, limma and the tophat-cuffdiff pipeline.

even the new pipeline: map only against the transcriptome (bowtie, allow for unlimited multi-mappings) and use eXpress followed by all statistical methods mentioned above was not so good...

dietmar

**bharat_iyengar** · 12-22-2012, 09:39 PM

Originally posted by kmcarr View Post

Most people use TopHat beacause that is the right tool for the job. When provided with a genome reference and annotation file TopHat will first align full reads to transcrpts and then split reads to the genome. You can tell TopHat not to search for new exons/junctions if you are not interested in that.

Many very smart people have tought long and hard about the best ways to properly analyze RNA-Seq data and the overwhelming consensus is that if you have a reference genome, especially in a model organism, align to the full genome with the annotion provided.

Understood. But i intend to know the reason why the consensus has been so?

Why is it logically better to map the reads to genome and provide annotations rather than mapping to an already annotated transcriptome index?

**dietmar13** · 12-22-2012, 10:21 PM

one answer:

because the transcriptome is much more complex (alternative splicing, exon skipping, exclusive exons, intron retention, alternative 5' and 3' splice sites, alternative tss and poly A-sites, ...) as the annotated transcriptome and tissue/disease specific... not to mention new lncRNA, small regulative RNAs and other transcripts

**bharat_iyengar** · 12-22-2012, 10:46 PM

Originally posted by dietmar13 View Post

because the transcriptome is much more complex (alternative splicing, exon skipping, exclusive exons, intron retention, alternative 5' and 3' splice sites, alternative tss and poly A-sites, ...) as the annotated transcriptome and tissue/disease specific... not to mention new lncRNA, small regulative RNAs and other transcripts

but its all annotated.. i know what the variants are..

most of the times the input RNA for seq is poly-A fractionated.
most regulatory/intermediate guys are already lost..

the transcriptome index occupies less space than the genome too..

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 12 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

Mapping to a transcriptome

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News