SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > 454 Pyrosequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
parallel de novo assembler tmy1018 Bioinformatics 3 10-22-2012 08:31 AM
Trinity - de novo transcriptome assembler samanta De novo discovery 2 07-29-2011 12:19 PM
de novo assembly of transcriptomes EMeyer Bioinformatics 12 09-27-2010 11:21 PM
Reproducibility of CLC de novo assembler corthay Bioinformatics 1 06-03-2010 05:07 AM
gsAssembler / newbler hangs during (large?) assembly jnfass 454 Pyrosequencing 6 01-06-2009 12:06 PM

Reply
 
Thread Tools
Old 03-14-2011, 11:17 AM   #1
cbouyio
Junior Member
 
Location: Paris, France

Join Date: Feb 2010
Posts: 2
Default GS De Novo Assembler (Newbler) -large option for transcriptomes

Hi all,

Has anyone ever tried the -large option for de-novo assembly of 454 transcriptome data.
The issue for the question is that the -large flag (flag for large of complex genomes) has no more documentation apart form that phrase I just wrote in the parentheses.
I understand that this is an option for genome assemblies (mostly.. only...???) but what is the influnce of this flag if one use it for transcriptomes.

The question has occured when I (for curiosity purposes) tried the -large flag for a transcriptome assembly (together with the -cdna flag of course) and then I observed a significant difference on the size and the constitution of the isotigs generated. No something significant in the number but significant difference in the lengths of the isotigs and how they have been put together.

Has anybody gone to the bottom of how this flag works?

Many thanks
cbouyio is offline   Reply With Quote
Old 03-14-2011, 01:20 PM   #2
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,150
Default

Quote:
Originally Posted by cbouyio View Post
The question has occured when I (for curiosity purposes) tried the -large flag for a transcriptome assembly (together with the -cdna flag of course) and then I observed a significant difference on the size and the constitution of the isotigs generated. No something significant in the number but significant difference in the lengths of the isotigs and how they have been put together.
I don't have answer for you, but a question. Do you think your assembly was made better or worse by using the -large option?
kmcarr is offline   Reply With Quote
Old 03-16-2011, 12:19 AM   #3
flxlex
Moderator
 
Location: Oslo, Norway

Join Date: Nov 2008
Posts: 415
Default

-large is supposed to be used for large genome assemblies, which won't finish 'ever' without the -large option set. On occasion, I needed it for transcriptome assemblies, otherwise they would take way too long.

Generally, one wants to avoid -large, as it shortcuts some steps and thereby can lead to worse results (shorter contigs, more reads mared as repeat, for instance).
flxlex is offline   Reply With Quote
Old 03-17-2011, 10:17 AM   #4
cbouyio
Junior Member
 
Location: Paris, France

Join Date: Feb 2010
Posts: 2
Default

Guys thanks for the replies.

@kmcarr I can not give a straight answer to your question for I can not tell from the numbers only wich transcriptome assembly was "better". The number, the n50 and the distribution of the lengths of the *isotigs* was marginaly "better' without the -large option, however the -large option gave me a better resolution for an individual multi copy gene family that we are after. I need to wait for the PCR aplicons from the wet lab guys to coroborate that, but the indications so far was that for a particular family (which BTW contains sevelar repeats) the -large option might give us better resolution.

@flxlex both with and without -large the assemblies run relative fine (about a couple of hours each in a 4core 32gb RAM machine) so finishing of the assembly is not an issue for us. However I take seriously into account your comment that -large "shortcuts some steps" and marks some reads as repeats and I ll have a manual look at the .ace files of the protein family we are after. The contigs number and lenght distributions as I mentioned are not significantly different. So with the lack of any other formal way, I ll go with the empirical assesment here and I manualy (and together with some wet lab confirmation) check which option give us better resolution for the family we are after.

Thanks again for your replies.
cbouyio is offline   Reply With Quote
Reply

Tags
454, assembly, newbler, transcriptome

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:13 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO