I'm currently in the process of assembling a genome scaffold (estimated size 1-1.2 Gb). So far I have Illumina data available, paired-end and mate-pair (50x and 5x coverage estimated). Unfortunately the quality of the Mate Pairs is not great, which appears to make scaffolding problematic.
After assembly with SOAP-denovo2 I get on average around 3 million scaffolds (k 27-127), highly fragmented (perhaps the genome is repeat rich?). I'm trying the ALLPATHS assembler as well at the moment to see if that makes a difference.
To improve our scaffolding we were wondering whether it would be worth it to get some PacBio data, making use of the longer reads. Would this be a worthwile investment and if so, how much would be required (1 - 10x or more?), for budget reasons obviously. Or are there other things that can be done with our current data that might improve assembly?
Best wishes,
Bas
After assembly with SOAP-denovo2 I get on average around 3 million scaffolds (k 27-127), highly fragmented (perhaps the genome is repeat rich?). I'm trying the ALLPATHS assembler as well at the moment to see if that makes a difference.
To improve our scaffolding we were wondering whether it would be worth it to get some PacBio data, making use of the longer reads. Would this be a worthwile investment and if so, how much would be required (1 - 10x or more?), for budget reasons obviously. Or are there other things that can be done with our current data that might improve assembly?
Best wishes,
Bas
Comment