Seqanswers Leaderboard Ad

**juadiegaitan** · 04-12-2017, 07:09 PM

There is a difference

Hi
Yes, there is a difference in the quality/completeness of the assembled transcriptomes (Cegma and BUSCO). I would suggest to use Trinity and then follow you pipeline with/without CLC.
Cheers

**gringer** · 04-13-2017, 12:17 AM

what is the general knowledge/feeling about CLC Bio and Trinity

CLC Bio: a black box, costs lots of money, can't be changed/modified without Qiagen's approval

Trinity: over 1000 citations, free and open source, a prescriptive published protocol for de-novo assembly and DE analysis. I hacked the code a little to get it to work on my desktop computer.

is either of the assemblers known for making mistakes?

All assemblers make mistakes. Ignoring algorithmic errors (which are potentially fixable), it's impossible to resolve repeats that are longer than the template length (and/or with a repeat unit that is longer than the read length). Sequencers make mistakes which the assemblers can propagate. Transposable elements mess up assemblies if they occur at multiple points throughout the genome. Assembly of single cells will be incomplete. Assembly of pooled multiple cells (or organism populations) will have cell-specific variation. Transcriptome assemblies based on poly-A selected transcripts will be incomplete. Transcriptome assemblies will be incomplete for varying levels of incompleteness based on what genes are activated at the time of sampling.

more directly, is either of them partial to misassembly of paralogs - if one gives me more single copy genes, is that a 'true' result or are they actually a mash up of paralogs?

While it might be possible to resolve paralogs if they have different expression levels (which are consistent throughout the transcript). You need to do a genome-guided assembly to have any hope of properly assembling paralogs with shared sequence.

**luc** · 04-14-2017, 04:01 PM

Trinity is the standard for de novo transcriptome assemblies. Thus also the artifacts it produces are relatively wellk nown.
Sorry, I have never used CLC for this purpose. I would suggest to contact CLC for suggested settings for transcriptomes (I can't imagine the defaults are optimal).
For genome assemblies CLC has the advantage that it will work with all kinds of data (all kinds of read lengths, paired or not paired, and even low quality data). In short it is extremely robust for this purpose.

**Dario1984** · 04-24-2017, 09:00 PM

I contacted QIAGEN support last year about this topic. CLC Genomics Workbench has no specific algorithm for assembling RNA-seq data. The support officer explained:

The CLC de novo assembly tool was designed with genomic data in mind. At the moment we have no tool that is specific to transcriptomic data assembly. This means that there is no step or action that explicitly handles cases of alternative splicings.

Also, CLC Genomics Workbench ignores RNA-seq strandedness.

You cannot utilize the strand-specific information in the RNA-seq data for the de novo assembly*job. So, it does not matter if you have unstranded data.

You'd be silly to choose CLC Genomics Workbench instead of Trinity for transcript assembly. CLC Genomics Workbench is so behind the times it can't even export sorted and indexed BAM files to disk.

BAM format files exported from the Workbench are not sorted nor indexed. If pairs are not on the same contig, the mates will be exported as single reads.

**nurgling** · 07-05-2017, 06:09 PM

Reply to Dario1984

Dear Dario,

Sorry for not posting a response. My phone selectively doesn't submit things, and responding to you was one of those.

Thanks you! This is exactly what I was looking for and was what I needed to convince my boss to switch away from CLC Bio.

Champion.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

CLC Bio vs. Trinity for de novo transcriptome assembly

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News