SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
de novo transcriptome differential expression problem slavailn Bioinformatics 6 05-18-2012 08:40 AM
De Novo Assembly of a transcriptome Neil De novo discovery 82 02-28-2012 09:44 AM
differential expression for de novo papori De novo discovery 2 05-26-2011 08:12 AM
De Novo Transcriptome Assembly QC Noremac General 0 05-19-2011 11:02 AM
de novo transcriptome assembly Niharika Introductions 8 02-07-2011 05:29 AM

Reply
 
Thread Tools
Old 04-01-2011, 10:24 PM   #1
gfmgfm
Member
 
Location: il

Join Date: Jun 2010
Posts: 64
Default de novo transcriptome and diffrential expression

Hello,

We have Illumina de novo transcriptome data of 3 different samples. We united the 3 samples and created from them contigs using different methods and united them with CAP3.
Now we want to check for differential expression in the 3 different samples using the contigs we defined. The problem is that there is redundancy in the contigs (due to either incomplete assembly or to real different transcripts from the same locus).
So it is a problem to map the reads uniquely to our contigs.
Any suggestions how to check for differential expression?
gfmgfm is offline   Reply With Quote
Old 04-02-2011, 12:56 AM   #2
petang
Member
 
Location: Taiwan

Join Date: Nov 2008
Posts: 13
Default

Quote:
Originally Posted by gfmgfm View Post
Hello,

We have Illumina de novo transcriptome data of 3 different samples. We united the 3 samples and created from them contigs using different methods and united them with CAP3.
Now we want to check for differential expression in the 3 different samples using the contigs we defined. The problem is that there is redundancy in the contigs (due to either incomplete assembly or to real different transcripts from the same locus).
So it is a problem to map the reads uniquely to our contigs.
Any suggestions how to check for differential expression?
You can merge all 3 datasets and assemble it together. Then use the assembled contigs as reference, re-map the reads from each dataset to the reference.
petang is offline   Reply With Quote
Old 04-02-2011, 01:12 AM   #3
gfmgfm
Member
 
Location: il

Join Date: Jun 2010
Posts: 64
Default

Thanks a lot for the reply!
This is what we did. But now,not sure how to map to the contigs as a reference. If we consider only unique tags, we get a very low percentage of uniqely aligned reads (probably because of some redundancy in the contigs and maybe because of real different transcripts of the same locus).
Any suggestions?
gfmgfm is offline   Reply With Quote
Old 04-03-2011, 11:01 PM   #4
schmima
Member
 
Location: Zürich

Join Date: Apr 2010
Posts: 56
Default

hm - depends a bit on what you want to do. You could either try to distribute multireads proportionally to the unique reads (what is a problem if the majority are multireads) or create a "non-redundant" reference (where you will sacrifice eventually truely different transcripts from a gene). For the latter you would have to group your transcripts together based on similarity and assemble them - the TGI clustering tool may help you to do this: http://compbio.dfci.harvard.edu/tgi/software/ .
schmima is offline   Reply With Quote
Old 04-04-2011, 04:41 AM   #5
gfmgfm
Member
 
Location: il

Join Date: Jun 2010
Posts: 64
Default

Thanks a lot! the TGI clustering tool looks very interesting. I am trying to run it.
gfmgfm is offline   Reply With Quote
Old 02-21-2012, 06:16 AM   #6
LizBent
Member
 
Location: Guelph, Ontario, Canada

Join Date: Jan 2012
Posts: 31
Default

I am also going to be mapping short reads to assembled contigs from multiple samples- and my strategy is to assemble the contigs together in Trinity, then map the reads to the contigs. I would assume that a clustering step would improve the quality of the data.

One question: I have tissue from two different organisms in some samples, so I have two transcriptomes. Would clustering take transcripts from different organisms for the same genes and cluster those?
LizBent is offline   Reply With Quote
Old 02-21-2012, 08:26 PM   #7
oxydeepu
Member
 
Location: bangalore,india

Join Date: Jul 2011
Posts: 41
Default Denovo Transcriptome Assembly.

Hi all,

I have paired end RNA-Seq tophat run. so now i have to run cufflinks on them. I dont have a refernce GTF file, but i have the genome and transcriptome file for the same. Can anyone pls tell me how to create a reference transcript annotation file from genome and transcriptome file..??

Thanking you in advance
Regards
Deepak.
oxydeepu is offline   Reply With Quote
Old 02-22-2012, 03:25 AM   #8
LizBent
Member
 
Location: Guelph, Ontario, Canada

Join Date: Jan 2012
Posts: 31
Default

Deepak, I suggest you post your question in a thread that is relevant- if you have a reference genome you are not doing de novo transcriptome assembly, and you are also not looking at differential gene expression unless you have multiple samples.
LizBent is offline   Reply With Quote
Old 02-22-2012, 10:19 PM   #9
gfmgfm
Member
 
Location: il

Join Date: Jun 2010
Posts: 64
Default

Hi LizBent,

I guess this depends on the overlap between the 2 genomes you are analyzing If there are very similar genes, I guess they might cluster together.
gfmgfm is offline   Reply With Quote
Old 02-22-2012, 11:54 PM   #10
Wallysb01
Senior Member
 
Location: San Francisco, CA

Join Date: Feb 2011
Posts: 286
Default

Have you thought about just using the average kmer coverage from your original, pre-CAP3, assemblies? Even with the cap3 assemblies you could use the log files to determine the sequences that got merged, their lengths, their average kmer coverage, then a weighted average of the kmer coverage of the CAP3-merged transcript.

Then, you could go back through these averages and flag ones that have relatively large variances in the kmer coverage of the merged transcripts. That could be a clue into either isoforms being merged or spurious merging.

I thought about using CAP3 with our transcriptome assemblies for things without a reference, but I just didn't trust it. What program are you using to assembly this, btw? I've noticed that while Trinity is very selective and maybe "under-assembles" somethings, its not very redundant, especially compared to the strategy taken by ABySS/trans-abyss.

You'll still hit similar downstream problems with estimating abundance, but it might be a little easier if you get rid of the redundancy earlier in the assembly process.
Wallysb01 is offline   Reply With Quote
Old 02-23-2012, 02:09 AM   #11
LizBent
Member
 
Location: Guelph, Ontario, Canada

Join Date: Jan 2012
Posts: 31
Default

Quote:
Originally Posted by Wallysb01 View Post
I thought about using CAP3 with our transcriptome assemblies for things without a reference, but I just didn't trust it. What program are you using to assembly this, btw? I've noticed that while Trinity is very selective and maybe "under-assembles" somethings, its not very redundant, especially compared to the strategy taken by ABySS/trans-abyss.

You'll still hit similar downstream problems with estimating abundance, but it might be a little easier if you get rid of the redundancy earlier in the assembly process.
Hi- so far I've been testing Trinity for my assemblies, though I was also thinking of using the Rnnotator pipeline (JGI Galaxy server), which uses Velvet. I'm not sure I understand what you mean by "redundant" - I'm new to all this, so would you mind explaining?
LizBent is offline   Reply With Quote
Old 02-23-2012, 09:18 AM   #12
Wallysb01
Senior Member
 
Location: San Francisco, CA

Join Date: Feb 2011
Posts: 286
Default

Quote:
Originally Posted by LizBent View Post
Hi- so far I've been testing Trinity for my assemblies, though I was also thinking of using the Rnnotator pipeline (JGI Galaxy server), which uses Velvet. I'm not sure I understand what you mean by "redundant" - I'm new to all this, so would you mind explaining?
Liz,

Differential coverage along your transcript and alternate splicing (plus the usual snps/indels) can lead to assemblers making several contigs out of the same gene. Sometimes they are alternate splice forms and sometimes its just an assembly artifact. Usually assemblers have some sort of merging step to try and reduce this, but again because of alternate splicing, you don't want to do this as aggressively as you can with genomic DNA.

From my experience Trinity does a pretty good job of giving you as complete of transcripts as possible with minimal redundancy. However, that comes at the cost of completeness. ABySS/trans-abyss does a very good job of just giving you everything, but its kinda messy. I haven't used Velvet based programs, so I can't speak to them.

If you don't have a reference genome, you're not done after assembly. I think you have to accept some attrition by doing things like extracting ORF and only keeping long ones (or even "complete" ones). You can also filter the contigs to only keep things that are <XX% similar and keeping only the longest contig of a the group using a tool like CD-HIT. Plus, doing a blast to take things that match up well with a closely related species. You could even filter your results to only take the best hit for each "reference" transcript, what ever you determine your reference to be.

It all depends on what you want the output to look like. Would you rather have fewer, more complete, non-redundant contigs at the cost of losing alternate splicing, and incomplete transcripts. Or do you want as much as possible, knowing you'll deal with redundancy.
Wallysb01 is offline   Reply With Quote
Old 04-03-2012, 11:54 AM   #13
RNAddict
Member
 
Location: East Coast

Join Date: Mar 2012
Posts: 17
Default

Quote:
Originally Posted by gfmgfm View Post
Hello,

We have Illumina de novo transcriptome data of 3 different samples. We united the 3 samples and created from them contigs using different methods and united them with CAP3.
Now we want to check for differential expression in the 3 different samples using the contigs we defined. The problem is that there is redundancy in the contigs (due to either incomplete assembly or to real different transcripts from the same locus).
So it is a problem to map the reads uniquely to our contigs.
Any suggestions how to check for differential expression?
We are having a similar experience. We de novo assembled a transcriptome we are using as a "reference" but when we map reads to that we get so many multi-mapped reads that many transcripts that we know are there (RT-PCR, Northerns, In situs) do not even show up as present in our in silco analysis.

We have tried various methods of reducing redundancy in our reference such as taking only the longest sequence from each cluster, using various contig assembly programs (CAP3 etc.)... these help... but they do not seem to solve the problem completely.

Since it has been sometime since your original post I was wondering what your experience has been with this issue.

How far did you take your elimination of redundant transcripts?
RNAddict is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:39 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO