![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Transcripts from RNA-seq assembly | StopCodon | RNA Sequencing | 6 | 07-08-2015 03:58 AM |
Map transcripts from de novo assemblier such as trinity back to the genome ? | sptmbr | Bioinformatics | 5 | 02-29-2012 10:47 AM |
FPKM determination of de novo transcripts | morebasesplease | RNA Sequencing | 0 | 08-06-2011 08:56 PM |
Removal of retained introns / primary transcripts from de novo RNAseq assembly | sandmann | RNA Sequencing | 1 | 07-29-2011 09:54 AM |
PubMed: Expression Analysis of miRNAs and Highly-expressed Small RNAs in Two Rice Sub | Newsbot! | Literature Watch | 0 | 10-28-2010 03:30 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: US Join Date: Apr 2011
Posts: 6
|
![]()
I am working on a transcriptome project where I have ~400MB-in-length 454 mRNA-seq reads sequenced on a non-normalized cDNA library. I was using mira3 to do de novo assembly of my reads, and it produced a decent assembly over transcripts with a moderate expression level. However, mira has a hard time assembling those highly expressed transcripts (>1000 copies or more). And it's the same thing with cap3. The TIGR assemler (TGICL) offers some ways to deal with highly expressed transcripts, but it doesn't have a great answer either.
I wonder if anyone has insight of assembling high-expression transcripts? Could de bruijin graph-based assemblers work in this scenario? Many thanks, Hao |
![]() |
![]() |
![]() |
#2 |
Member
Location: Heidelberg Join Date: Feb 2011
Posts: 69
|
![]()
i also do a de novo assembly of a transcriptome and velvet/oases (de brujin graph-based) works fine especially for highly expressed transcripts. These are especially good assembled when you choose a high kmer.
|
![]() |
![]() |
![]() |
#3 |
Senior Member
Location: Berlin, DE Join Date: May 2008
Posts: 628
|
![]()
Have you tried Roche's Newbler in cDNA mode?
|
![]() |
![]() |
![]() |
#4 |
Junior Member
Location: US Join Date: Apr 2011
Posts: 6
|
![]()
That's great to know. Just to clarify, are velvet/oases working fine too on 454 reads?
|
![]() |
![]() |
![]() |
#5 |
Junior Member
Location: US Join Date: Apr 2011
Posts: 6
|
![]() |
![]() |
![]() |
![]() |
#6 | |
Senior Member
Location: USA, Midwest Join Date: May 2008
Posts: 1,178
|
![]() Quote:
http://454.com/contact-us/software-request.asp |
|
![]() |
![]() |
![]() |
#7 |
Junior Member
Location: US Join Date: Apr 2011
Posts: 6
|
![]()
I was sending an email. I'll trying sending an request form too. Thanks for letting me know.
|
![]() |
![]() |
![]() |
#8 |
Member
Location: Heidelberg Join Date: Feb 2011
Posts: 69
|
![]() |
![]() |
![]() |
![]() |
#9 |
Member
Location: Bad Nauheim, Germany Join Date: Mar 2011
Posts: 31
|
![]()
Does a non-normalized cDNA library has an impact on number of reads used by the assembler?
I'm asking because we're also working with Illumina Reads. We're using Velvet and SOAPdenovo at the moment. Velvet, for example, only uses 15594122 / 87419634 reads. Our reads are (after quality trimming to mean_qual = 20 and min_len = 35) between 35 and 60 bp long, the kmer-value for this assembly was set to 29 and velvet was run in -shortPaired mode. Anyway there are about 330 000 Contigs with N50=106 and 2300 Contigs longer than 500bp with N50=696. Using lower kmer-values decreases the number of contigs, but increases the numer of used reads which is going on with a decrease in N50 value in both, all and long contigs only. Last edited by Jenzo; 05-04-2011 at 01:55 AM. |
![]() |
![]() |
![]() |
#10 |
Member
Location: Heidelberg Join Date: Feb 2011
Posts: 69
|
![]()
so what exactly is your question? This sounds all reasonable to me. You have a comparison to a normalized library? What is your expected coverage? And kmer 29 might be bit high if your calculated expected coverage is around 10-20.
|
![]() |
![]() |
![]() |
#11 |
Member
Location: Bad Nauheim, Germany Join Date: Mar 2011
Posts: 31
|
![]()
Thanks for reply, Thorondor! We have a normalized library, sequenced with 454 and assembled it using nearly 90% of all reads with Mira. The question is: Why does Velvet use only about 15% of all reads and could it be because of the non-normalisation?
Mean Coverage is (according to Velvet's own measurement in contigs.fa) between 21 and 26 for all long Contigs (> 500bp). Perhaps someone can recommend an assembler, which uses more reads on a non-normalized library. Fyi, we did 8 assemblies with Velvet, using the following kmer-values: 21 (-short, for scaffolding with other algorithms using PE-information), 23 (-shortPaired), 23 (-short), 25 (-short), 27 (-shortPaired), 29 (-shortPaired), 31 (-shortPaired), 35 (-short). With k=23, shortPaired, Velvet uses about 25% of all reads, which was the maximum of all assemblies. Because scaffolding with other algorithms increases N50 currently up to 950 we would like to use Velvet only in -short mode, where the number of used reads is low (~11%). Got my question? :-) Last edited by Jenzo; 05-04-2011 at 02:21 AM. Reason: error correction |
![]() |
![]() |
![]() |
#12 |
Member
Location: Heidelberg Join Date: Feb 2011
Posts: 69
|
![]()
no, i don't think that the non-normalisation is the reason here, but keep in mind that you coverage is not consistent over all transcripts. So it might get some transcripts better assembled with a setting the exp_cov really high and some better when you really set it to low values, this will influence the amount of reads used.
Also try to estimate your expected coverage on your own e.g. (total amount of bp in your reads)/(expected transcriptome size) Are the paired end shuffled correctly into one file after trimming? Some reads are discarded after trimming so did you use select_paired.pl in the contrib-folder of velvet? Since it seems like you do de novo transcriptome assembly why not try oases? |
![]() |
![]() |
![]() |
#13 |
Member
Location: Bad Nauheim, Germany Join Date: Mar 2011
Posts: 31
|
![]()
Dear Thorondor, thanks a lot for this suggestions! I'll try to estimate coverage and then try some values for exp_cov.
I'm really sure, that the reads are shuffled correctly, because trimming did not discard reads at all (low quality reads were just a single N after quality trimming) and the filter-on-length-script was wrote by myself, respecting always both reads (/1 and /2) and discarding none or both. And you're right, we're doing de novo transcriptome assembly, but Oases runs out of memory (32 GB RAM). I set up a new virtual machine now, with 32 GB physical and about 60GB in swap and will try to run Oases on velvetg's output. (I already know that it will take a while ^^) Thanks again a lot for help :-) |
![]() |
![]() |
![]() |
Thread Tools | |
|
|