View Single Post
Old 02-08-2016, 03:31 AM   #6
Markiyan
Senior Member
 
Location: Cambridge

Join Date: Sep 2010
Posts: 116
Exclamation Looks like the de novo transcriptome assembly needs to be properly done first...

Dear Moldach,

It looks like the denovo assembly needs to be done properly first.
Assumming you were using illumina:
For that you really need to start from cDNA library with 350-600 bp fragment size, than sequence it on the miseq or hiseq in 2x250 or 2x300 bp run mode (read the illumina cDNA library prep protocol, fragmentation section).
Or do PacBio's isoseq...
(If you did Illumina 1x75 bp or 1x100bp - it would not cut it very well...)
Than process you data through the flash or panda (preassembly), and than do an incremental pure de novo assembly starting from 10k read and going up.

Check the most abundant transcripts for completeion, and add them to the "vector.seq" database, so they wouldn't interfere with the next round of the assembly for the less abundant things.

You can use MIRA or any other assembler in the est mode (can also try with CLC or DNASTAR's ngen).

Than combine the final edition of the vector.seq database with your final contigs and:
1. use it as reference for mapping reads to it (to get the relative abundance)
2. annotate your reference by blastx

I wouln't rely on any reference based methods if the similarity between the beasts is less than 95% on the DNA level.

Markiyan.
Markiyan is offline   Reply With Quote