![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
de novo transcriptome assembly/RNA-seq | samanta | General | 0 | 08-24-2011 01:07 PM |
De Novo assembly of a plant transcriptome | raonyguimaraes | RNA Sequencing | 7 | 07-05-2011 02:17 PM |
De Novo Transcriptome Assembly QC | Noremac | General | 0 | 05-19-2011 12:02 PM |
de novo transcriptome assembly | Niharika | Introductions | 8 | 02-07-2011 06:29 AM |
de novo transcriptome assembly | chenjy | RNA Sequencing | 4 | 12-07-2010 12:54 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: Australia Join Date: Mar 2009
Posts: 1
|
![]()
Hi all,
We are planning to perform an mRNA-seq run using the Illumina GAII platform. We are worried about assembling the transcriptome when we get our data back. Most of the RNA-seq papers I read are assembling to a reference genome/transcriptome, we don't have either of these! Is there anyone out there that has assembled cDNA short reads de novo? If so, are paired reads as important as they are with genome assembly? Is there an example database of mRNA-seq short-pair reads that i can download to simulate assembly? also, what software would you recommend for this? hope someone can help best regards neil |
![]() |
![]() |
![]() |
#2 |
Member
Location: India Join Date: Oct 2008
Posts: 36
|
![]()
Check for ESTs may help you in assembly
de novo assembly of transcriptome.... what about misassemblies... |
![]() |
![]() |
![]() |
#3 |
Member
Location: Davis, CA Join Date: Aug 2008
Posts: 88
|
![]()
Though I haven't finished the project (reads aren't all in yet), I'm doing something similar right now: no reference transcriptome, but looking for SNPs in cDNA reads of two subspecies. The first was sequenced with single-ended reads, and resulted in pretty short contigs, and only roughly 1/10 of the trancriptome total was assembled. I'm recommending paired-ends for the second sample, so I may have a quantitative answer for you in a couple of weeks.
The transcriptome may have more unique, assemblable sequence than the genome .. but homologous domains will be a problem, and paired-ends would definitely help there. That's why I'd guess that a small insert library should help quite a bit. I'd recommend velvet - seems to still be the best option out there for Illumina reads. Not sure on simulation ... |
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: Switzerland Join Date: Aug 2008
Posts: 124
|
![]()
A year ago, de novo transcriptome sequencing solely based on Illumina GAII is a bad idea. With 72bp PE reads and higher coverage, nothing is impossible now.
Like what Rao suggested, EST data will be helpful for the assembly. But, the fact is most organisms of interest don’t have comprehensive EST information. No available reference genome/ transcriptome (not even from a related species). You don’t know the exact size of the transcriptome, repeats, paralogous genes and isoforms problem. It’s tricky to tell even if your assembly went wrong. Like I said, it depends on the purpose of sequencing. Things is a lot easier if the goal is to discover SNPs. If the results are not satisfying, try other alternatives like sequencing using longer reads. |
![]() |
![]() |
![]() |
#5 |
Member
Location: València, Spain Join Date: Apr 2009
Posts: 48
|
![]()
Hi all!
I'm doing the annotation of a transcriptome of a non reference organism, something similar like you. My assembly was made with GS de novo assembler, but I had short contigs... I'm trying the assembly with Mosaik but prior I have another problem: what about transposable elements? Have you tried to use windowmasker? Or RepeatMasker? For an organism without a database for these repetitives elements, which program do you think is better? Thanks! |
![]() |
![]() |
![]() |
#6 | |
Senior Member
Location: Switzerland Join Date: Aug 2008
Posts: 124
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#7 |
Member
Location: València, Spain Join Date: Apr 2009
Posts: 48
|
![]()
Because if you haven't a large coverage and the same repetitive elements could appears in different genes, how do I know which protein has been translated? So, I would mask these elements.
The low coverage has been my problem with Standard GS de novo assembler. Length contigs aprox 200 bp and a coverage from 4X to 6X. ![]() Thanks! |
![]() |
![]() |
![]() |
#8 |
Member
Location: València, Spain Join Date: Apr 2009
Posts: 48
|
![]()
oh, sorry. I found repetitive elements which are reverses transcriptases, located at 3' UTR of different genes. How can I differenciate the origin of my blast results?
|
![]() |
![]() |
![]() |
#9 | |
Senior Member
Location: Switzerland Join Date: Aug 2008
Posts: 124
|
![]() Quote:
If you are using blast to annotate your contigs, using 3' UTR is not a good idea because that region can varies even within the same species. I have used CENSOR to find repeats in my ESTs but there's no significant hits. Most hits are around 100bp with 80% similarity (The original genomic repeat is several kb long) and it only exist once in the ESTs. Maybe plants repeat databases are not well-characterized. In the end, I just ignore them. ![]() Found a related thread on repeat at http://seqanswers.com/forums/showthread.php?t=1504 |
|
![]() |
![]() |
![]() |
#10 |
Senior Member
Location: SEA Join Date: Nov 2009
Posts: 203
|
![]()
So in short no one has done de novo transcriptome assembly for new organism before?
can we use a closely related species like fish to do that for de novo? how about taking it further with doing expression profiling on the new organism? |
![]() |
![]() |
![]() |
#11 |
Member
Location: Davis, CA Join Date: Oct 2009
Posts: 17
|
![]()
We assembled lettuce transcriptome using 85 nt IGA single reads. We used CLC and Velvet followed by CAP3.
|
![]() |
![]() |
![]() |
#12 |
Senior Member
Location: Boston area Join Date: Nov 2007
Posts: 747
|
![]()
While this is not de novo assembly of a novel transcriptome, in some ways it is better because it can be compared against a known transcriptome (which was not used in the assembly as far as I know
http://bioinformatics.oxfordjournals...&pmid=19528083 Bioinformatics. 2009 Nov 1;25(21):2872-7. Epub 2009 Jun 15. De novo transcriptome assembly with ABySS. Birol I, Jackman SD, Nielsen CB, Qian JQ, Varhol R, Stazyk G, Morin RD, Zhao Y, Hirst M, Schein JE, Horsman DE, Connors JM, Gascoyne RD, Marra MA, Jones SJ. Genome Sciences Centre, Vancouver, BC V5Z 4S6, Canada. ibirol@bcgsc.ca MOTIVATION: Whole transcriptome shotgun sequencing data from non-normalized samples offer unique opportunities to study the metabolic states of organisms. One can deduce gene expression levels using sequence coverage as a surrogate, identify coding changes or discover novel isoforms or transcripts. Especially for discovery of novel events, de novo assembly of transcriptomes is desirable. RESULTS: Transcriptome from tumor tissue of a patient with follicular lymphoma was sequenced with 36 base pair (bp) single- and paired-end reads on the Illumina Genome Analyzer II platform. We assembled approximately 194 million reads using ABySS into 66 921 contigs 100 bp or longer, with a maximum contig length of 10 951 bp, representing over 30 million base pairs of unique transcriptome sequence, or roughly 1% of the genome. AVAILABILITY AND IMPLEMENTATION: Source code and binaries of ABySS are freely available for download at http://www.bcgsc.ca/platform/bioinfo/software/abyss. Assembler tool is implemented in C++. The parallel version uses Open MPI. ABySS-Explorer tool is implemented in Java using the Java universal network/graph framework. CONTACT: ibirol@bcgsc.ca. PMID: 19528083 |
![]() |
![]() |
![]() |
#13 |
Senior Member
Location: SEA Join Date: Nov 2009
Posts: 203
|
![]() |
![]() |
![]() |
![]() |
#14 |
Junior Member
Location: Denmark Join Date: Jan 2009
Posts: 8
|
![]()
We have done several de Novo transcriptome projects mainly using Illumina technology and the Abyss assembler. In general it works but the problem is getting full length sequences (from start to stop codon). We have recently learned that some labs uses coligation of the transcipts prior to the nebulization. It should increase the number of full length genes. The reason is that the fragmentation is non random at the ends making the ends underrepresented in the library.
|
![]() |
![]() |
![]() |
#15 |
Member
Location: Davis, CA Join Date: Oct 2009
Posts: 17
|
![]()
KevinLam,
The data is unpublished. We are re-assembling the reads using the latest version of CLC assembler and Velvet with adjusted parameters. The number of transcriptome contigs in our latest assemblies went down from ~70K to ~57K. I have a presentation on-line with results from last summer assemblies here: https://docs.google.com/fileview?id=...MWYzNjkz&hl=en Since we assembled correctly the longest genes in plants including BIG (>15 kb) we believe the approach works. More technical notes on filtering the reads and Velvet parameters used are here: http://atgc-illumina.googlecode.com/...k_090910_D.pdf |
![]() |
![]() |
![]() |
#16 |
(Jeremy Leipzig)
Location: Philadelphia, PA Join Date: May 2009
Posts: 116
|
![]()
From section 4.2 and 4.3 of the new CLC white paper, it appears that the old CLC assembler made slightly longer contigs (unpaired max CLC69kbp vs VEL60kbp, N50 CLC23kbp vs VEL16kbp) at the expensive of more incorrect ones (CLC: 36 wrong, VEL :1 wrong). The newer one leans too far the other way. Who knows what velvet parameters were used - probably the ones that most closely matched the total CLC assembly size.
http://www.clcbio.com/files/whitepap...C_NGS_Cell.pdf I'm not so sure there is a free lunch here. Marta, what cvCut and expCov parameters did you use in your Velvet assemblies? The cvCut parameter has a huge effect on N50, assembly size, and read usage. Last edited by Zigster; 12-16-2009 at 08:16 PM. |
![]() |
![]() |
![]() |
#17 |
Member
Location: Davis, CA Join Date: Oct 2009
Posts: 17
|
![]()
The experiment CLC did for this white paper does not reflect the actual performance of the CLC assembler. I think the assembler is much better than what the paper claims.
I use CLC Genomics WorkBench on Windows with 32GB RAM. A few days ago I started to test the latest (beta) version of the assembler for Workbench. It performs much better than the older one. My input is 92.5 Million of transcriptome single reads that are up to 85 nt long (IGA, filtered fasta). About Velvet - my understanding that there is not much sense in changing expCov for transcriptome reads. We work with normalized mRNA libraries, but still the coverage between different transcrips varies a lot. About cvCut you need to contact alex_kozik (he is a member here). He is the one who ran all Velvet assemblies on the same set. |
![]() |
![]() |
![]() |
#18 | |
Junior Member
Location: Berlin, Germany Join Date: Feb 2010
Posts: 5
|
![]() Quote:
I would recommend our new software Oases see the thread Oases: De novo transcriptome assembly of very short reads or http://www.ebi.ac.uk/~zerbino/oases/. The software is designed to cope with alternative splicing and repetitive regions that normally break up contigs (for example if genome assemblers are used). Oases can produce full length transcripts if the coverage allows it and does also support/exploit paired-end information. And yes, paired-end information does improve the results. Oases already supports longer reads (e.g. 75 bp) that are produced by the current technologies. Bests, Marcel |
|
![]() |
![]() |
![]() |
#19 |
Member
Location: UK Join Date: Sep 2009
Posts: 20
|
![]()
How are people evaluating their transcriptome assemblies? The standard N50 assessment can't be that useful, as the goal here isn't exactly to generate a tiny set of huge contigs...?
|
![]() |
![]() |
![]() |
#20 | |
Senior Member
Location: Switzerland Join Date: Aug 2008
Posts: 124
|
![]()
Interesting question, Blackgore! Without a reference/gene model/ESTs, how to evaluate a de novo transcriptome assembly?
Quote:
Anyway, both methods (including N50) doesn't say much about the scaffolds quality. There can be scaffolds with lots of Ns due to poorly sequenced insert gaps. Compare two datasets with the same N50 and longest contig but one with lots of Ns, how can you tell the difference? |
|
![]() |
![]() |
![]() |
Tags |
de novo assembly, illumina, short read length, transcriptomes |
Thread Tools | |
|
|