I'm trying to assemble a relatively large insect genome (~ 1.5 Gbp) and have sequencing data from two different sequencing platforms that I want to combine, in order to get the best possible assembly.
More specifically, I have Illumina HiSeq data (2 x 100 bp), with insert size of 550 bp that give me around 40x coverage (from 4 libraries). Recently, I also sequenced one of these 550 bp libraries using the MiSeq platform (2 x 300 bp, overlapping reads). After merging of the mates I get "long" reads (most of them are >400 bp), with an estimated coverage of about 3x.
So, what do you think is the best strategy for de novo assembly when you have sequencing data that differ that much in terms of read length and sequencing coverage?
The reason I'm asking is because I think that pooling all reads together and trying to assemble using a kmer-based assembler will "confuse" the assembler because of the difference in sequencing coverage. Moreover, I'm also guessing that I'm not really making the most out of my longer MiSeq reads, if I use a kmer-based assembler.
Do you think an alternative would be to assemble the HiSeq and MiSeq data separately and then combine them using an OLC (overlap-layout-consensus) assembler (instead of kmer-based one)? If so, is there such an assembler that is particularly good at this task?
Thanks!
More specifically, I have Illumina HiSeq data (2 x 100 bp), with insert size of 550 bp that give me around 40x coverage (from 4 libraries). Recently, I also sequenced one of these 550 bp libraries using the MiSeq platform (2 x 300 bp, overlapping reads). After merging of the mates I get "long" reads (most of them are >400 bp), with an estimated coverage of about 3x.
So, what do you think is the best strategy for de novo assembly when you have sequencing data that differ that much in terms of read length and sequencing coverage?
The reason I'm asking is because I think that pooling all reads together and trying to assemble using a kmer-based assembler will "confuse" the assembler because of the difference in sequencing coverage. Moreover, I'm also guessing that I'm not really making the most out of my longer MiSeq reads, if I use a kmer-based assembler.
Do you think an alternative would be to assemble the HiSeq and MiSeq data separately and then combine them using an OLC (overlap-layout-consensus) assembler (instead of kmer-based one)? If so, is there such an assembler that is particularly good at this task?
Thanks!
Comment