Seqanswers Leaderboard Ad

**dongilbert** · 03-13-2014, 07:45 PM

You don't say how you did your single cultivar assemblies that were short, but if it was Trinity, then add Velvet/Oases and SoapTrans and/or TransAbyss, all of which give you more complete assemblies if your input is paired end reads. Use multi-kmers up to size of reads, as that gives more complete assembly of the high expressed genes.

See here for software that picks your best gene subset of several transcript assemblies of the same data:

EvidentialGene

http://arthropods.eugenes.org/EvidentialGene/

see about/EvidentialGene_trassembly_pipe.html for the software.

This paper is an independently done comparison of methods, with essentially same conclusions, that combining best of several assembliers, using CDS-size metrics, gives you the most complete genes:

Combining Transcriptome Assemblies from Multiple De Novo Assemblers in the Allo-Tetraploid Plant Nicotiana benthamiana

http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0091776

Background Nicotiana benthamiana is an allo-tetraploid plant, which can be challenging for de novo transcriptome assemblies due to homeologous and duplicated gene copies. Transcripts generated from such genes can be distinct yet highly similar in sequence, with markedly differing expression levels. This can lead to unassembled, partially assembled or mis-assembled contigs. Due to the different properties of de novo assemblers, no one assembler with any one given parameter space can re-assemble all possible transcripts from a transcriptome. Results In an effort to maximise the diversity and completeness of de novo assembled transcripts, we utilised four de novo transcriptome assemblers, TransAbyss, Trinity, SOAPdenovo-Trans, and Oases, using a range of k-mer sizes and different input RNA-seq read counts. We complemented the parameter space biologically by using RNA from 10 plant tissues. We then combined the output of all assemblies into a large super-set of sequences. Using a method from the EvidentialGene pipeline, the combined assembly was reduced from 9.9 million de novo assembled transcripts to about 235,000 of which about 50,000 were classified as primary. Metrics such as average bit-scores, feature response curves and the ability to distinguish paralogous or homeologous transcripts, indicated that the EvidentialGene processed assembly was of high quality. Of 35 RNA silencing gene transcripts, 34 were identified as assembled to full length, whereas in a previous assembly using only one assembler, 9 of these were partially assembled. Conclusions To achieve a high quality transcriptome, it is advantageous to implement and combine the output from as many different de novo assemblers as possible. We have in essence taking the ‘best’ output from each assembler while minimising sequence redundancy. We have also shown that simultaneous assessment of a variety of metrics, not just focused on contig length, is necessary to gauge the quality of assemblies.

The artifact gene-joins (fake fusions) are exacerbated using post-assembly mergers such as CAP and velvet/o -merge (maybe also mira, i've not tested that tho). In general the post-assembly mergers don't use all the read pair info and make more mistakes by joining things that don't belong.

You can use your cultivar mixed read set for another assembly, if above re-assembly with other assemblers doesn't help enough. The CDS-selection pipeline I've built throws out those mistakes as the CDS never spans gene joins (too many stop codons).

There are several tips here that work to improve mRNA assemblies

http://arthropods.eugenes.org/EvidentialGene/evigene/docs/perfect-mrna-assembly-2013jan.txt

For matching your 2 cultivars I suggest matching CDS also/instead as much of the assembly differences (artifacts, shortness) will be in UTRs. You may also want to measure expression differences only on CDS (or CDS +100bp)
to avoid those assembly artifacts.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Two related transcriptomes: merging but avoiding fake fusion transcripts

Comment

Latest Articles

ad_right_rmr

News