View Single Post
Old 04-13-2017, 01:17 AM   #3
David Eccles (gringer)
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 823

what is the general knowledge/feeling about CLC Bio and Trinity
CLC Bio: a black box, costs lots of money, can't be changed/modified without Qiagen's approval

Trinity: over 1000 citations, free and open source, a prescriptive published protocol for de-novo assembly and DE analysis. I hacked the code a little to get it to work on my desktop computer.

is either of the assemblers known for making mistakes?
All assemblers make mistakes. Ignoring algorithmic errors (which are potentially fixable), it's impossible to resolve repeats that are longer than the template length (and/or with a repeat unit that is longer than the read length). Sequencers make mistakes which the assemblers can propagate. Transposable elements mess up assemblies if they occur at multiple points throughout the genome. Assembly of single cells will be incomplete. Assembly of pooled multiple cells (or organism populations) will have cell-specific variation. Transcriptome assemblies based on poly-A selected transcripts will be incomplete. Transcriptome assemblies will be incomplete for varying levels of incompleteness based on what genes are activated at the time of sampling.

more directly, is either of them partial to misassembly of paralogs - if one gives me more single copy genes, is that a 'true' result or are they actually a mash up of paralogs?
While it might be possible to resolve paralogs if they have different expression levels (which are consistent throughout the transcript). You need to do a genome-guided assembly to have any hope of properly assembling paralogs with shared sequence.
gringer is offline   Reply With Quote