SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
Transcripts from RNA-seq assembly StopCodon RNA Sequencing 6 07-08-2015 03:58 AM
Map transcripts from de novo assemblier such as trinity back to the genome ? sptmbr Bioinformatics 5 02-29-2012 10:47 AM
FPKM determination of de novo transcripts morebasesplease RNA Sequencing 0 08-06-2011 08:56 PM
Removal of retained introns / primary transcripts from de novo RNAseq assembly sandmann RNA Sequencing 1 07-29-2011 09:54 AM
PubMed: Expression Analysis of miRNAs and Highly-expressed Small RNAs in Two Rice Sub Newsbot! Literature Watch 0 10-28-2010 03:30 AM

Reply
 
Thread Tools
Old 04-18-2011, 02:03 PM   #1
foryvonne
Junior Member
 
Location: US

Join Date: Apr 2011
Posts: 6
Default De novo assembly of highly expressed transcripts

I am working on a transcriptome project where I have ~400MB-in-length 454 mRNA-seq reads sequenced on a non-normalized cDNA library. I was using mira3 to do de novo assembly of my reads, and it produced a decent assembly over transcripts with a moderate expression level. However, mira has a hard time assembling those highly expressed transcripts (>1000 copies or more). And it's the same thing with cap3. The TIGR assemler (TGICL) offers some ways to deal with highly expressed transcripts, but it doesn't have a great answer either.

I wonder if anyone has insight of assembling high-expression transcripts? Could de bruijin graph-based assemblers work in this scenario?

Many thanks,
Hao
foryvonne is offline   Reply With Quote
Old 04-19-2011, 02:16 AM   #2
Thorondor
Member
 
Location: Heidelberg

Join Date: Feb 2011
Posts: 69
Default

i also do a de novo assembly of a transcriptome and velvet/oases (de brujin graph-based) works fine especially for highly expressed transcripts. These are especially good assembled when you choose a high kmer.
Thorondor is offline   Reply With Quote
Old 04-19-2011, 07:20 AM   #3
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 628
Default

Have you tried Roche's Newbler in cDNA mode?
sklages is offline   Reply With Quote
Old 04-19-2011, 08:38 AM   #4
foryvonne
Junior Member
 
Location: US

Join Date: Apr 2011
Posts: 6
Default

Quote:
Originally Posted by Thorondor View Post
i also do a de novo assembly of a transcriptome and velvet/oases (de brujin graph-based) works fine especially for highly expressed transcripts. These are especially good assembled when you choose a high kmer.
That's great to know. Just to clarify, are velvet/oases working fine too on 454 reads?
foryvonne is offline   Reply With Quote
Old 04-19-2011, 08:39 AM   #5
foryvonne
Junior Member
 
Location: US

Join Date: Apr 2011
Posts: 6
Default

Quote:
Originally Posted by sklages View Post
Have you tried Roche's Newbler in cDNA mode?
I don't have a copy of Newbler. I emailed Roche for one weeks ago but still waiting for their replies.
foryvonne is offline   Reply With Quote
Old 04-19-2011, 08:50 AM   #6
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,178
Default

Quote:
Originally Posted by foryvonne View Post
I don't have a copy of Newbler. I emailed Roche for one weeks ago but still waiting for their replies.
Did you just send an e-mail to a general contact address or did you use their online software request form:

http://454.com/contact-us/software-request.asp
kmcarr is offline   Reply With Quote
Old 04-19-2011, 08:54 AM   #7
foryvonne
Junior Member
 
Location: US

Join Date: Apr 2011
Posts: 6
Default

I was sending an email. I'll trying sending an request form too. Thanks for letting me know.
foryvonne is offline   Reply With Quote
Old 04-20-2011, 01:17 AM   #8
Thorondor
Member
 
Location: Heidelberg

Join Date: Feb 2011
Posts: 69
Default

Quote:
Originally Posted by foryvonne View Post
That's great to know. Just to clarify, are velvet/oases working fine too on 454 reads?
Well it should work fine I guess especially for the high expressed transcripts. But I can't say that for sure since I am working with Illumina reads.
Thorondor is offline   Reply With Quote
Old 05-03-2011, 03:02 AM   #9
Jenzo
Member
 
Location: Bad Nauheim, Germany

Join Date: Mar 2011
Posts: 31
Default

Does a non-normalized cDNA library has an impact on number of reads used by the assembler?
I'm asking because we're also working with Illumina Reads. We're using Velvet and SOAPdenovo at the moment. Velvet, for example, only uses 15594122 / 87419634 reads. Our reads are (after quality trimming to mean_qual = 20 and min_len = 35) between 35 and 60 bp long, the kmer-value for this assembly was set to 29 and velvet was run in -shortPaired mode. Anyway there are about 330 000 Contigs with N50=106 and 2300 Contigs longer than 500bp with N50=696.
Using lower kmer-values decreases the number of contigs, but increases the numer of used reads which is going on with a decrease in N50 value in both, all and long contigs only.

Last edited by Jenzo; 05-04-2011 at 01:55 AM.
Jenzo is offline   Reply With Quote
Old 05-04-2011, 01:54 AM   #10
Thorondor
Member
 
Location: Heidelberg

Join Date: Feb 2011
Posts: 69
Default

so what exactly is your question? This sounds all reasonable to me. You have a comparison to a normalized library? What is your expected coverage? And kmer 29 might be bit high if your calculated expected coverage is around 10-20.
Thorondor is offline   Reply With Quote
Old 05-04-2011, 02:16 AM   #11
Jenzo
Member
 
Location: Bad Nauheim, Germany

Join Date: Mar 2011
Posts: 31
Default

Thanks for reply, Thorondor! We have a normalized library, sequenced with 454 and assembled it using nearly 90% of all reads with Mira. The question is: Why does Velvet use only about 15% of all reads and could it be because of the non-normalisation?
Mean Coverage is (according to Velvet's own measurement in contigs.fa) between 21 and 26 for all long Contigs (> 500bp).
Perhaps someone can recommend an assembler, which uses more reads on a non-normalized library.

Fyi, we did 8 assemblies with Velvet, using the following kmer-values: 21 (-short, for scaffolding with other algorithms using PE-information), 23 (-shortPaired), 23 (-short), 25 (-short), 27 (-shortPaired), 29 (-shortPaired), 31 (-shortPaired), 35 (-short). With k=23, shortPaired, Velvet uses about 25% of all reads, which was the maximum of all assemblies. Because scaffolding with other algorithms increases N50 currently up to 950 we would like to use Velvet only in -short mode, where the number of used reads is low (~11%).
Got my question? :-)

Last edited by Jenzo; 05-04-2011 at 02:21 AM. Reason: error correction
Jenzo is offline   Reply With Quote
Old 05-04-2011, 02:48 AM   #12
Thorondor
Member
 
Location: Heidelberg

Join Date: Feb 2011
Posts: 69
Default

no, i don't think that the non-normalisation is the reason here, but keep in mind that you coverage is not consistent over all transcripts. So it might get some transcripts better assembled with a setting the exp_cov really high and some better when you really set it to low values, this will influence the amount of reads used.

Also try to estimate your expected coverage on your own e.g. (total amount of bp in your reads)/(expected transcriptome size)

Are the paired end shuffled correctly into one file after trimming? Some reads are discarded after trimming so did you use select_paired.pl in the contrib-folder of velvet?

Since it seems like you do de novo transcriptome assembly why not try oases?
Thorondor is offline   Reply With Quote
Old 05-04-2011, 03:47 AM   #13
Jenzo
Member
 
Location: Bad Nauheim, Germany

Join Date: Mar 2011
Posts: 31
Default

Dear Thorondor, thanks a lot for this suggestions! I'll try to estimate coverage and then try some values for exp_cov.
I'm really sure, that the reads are shuffled correctly, because trimming did not discard reads at all (low quality reads were just a single N after quality trimming) and the filter-on-length-script was wrote by myself, respecting always both reads (/1 and /2) and discarding none or both.
And you're right, we're doing de novo transcriptome assembly, but Oases runs out of memory (32 GB RAM). I set up a new virtual machine now, with 32 GB physical and about 60GB in swap and will try to run Oases on velvetg's output. (I already know that it will take a while ^^)
Thanks again a lot for help :-)
Jenzo is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:13 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO