I am trying to de novo assemble a large bacterial genome (~9.2MB) with a high GC content (~67%). We have paired end data from a single miSeq run. Using a couple different combinations of programs (SPAdes and A5) we have been able to assemble our data into contigs (~700-900). Obviously, we have gaps and are very likely missing some regions of the genome as our contigs span ~8.8MB. I am new to genome sequencing and do not want to cut corners. At the same time, I would like to avoid unnecessary costs if possible for this assembly. From what I understand, I see two options:
A. More short read data. We could do an additional miSeq run starting at the library prep stage or using excess DNA saved after the library prep was done. This would provide more short read data, but I am unsure if doing this will only give reads similar to before. Does anyone have experience with this? Is a second run likely to only sequence the same regions as the first time or can we expect to get data on previously unsequenced regions with an additional run?
B. Long-read. This will help join contigs into scaffolds and hopefully a full genome, but we will likely have very low coverage/inaccuracies for those areas of the genome that the miSeq missed.
Any recommendations on if A or B should be sufficient for a genome assembly given where we are at or will both be necessary? Thanks for the help!
A. More short read data. We could do an additional miSeq run starting at the library prep stage or using excess DNA saved after the library prep was done. This would provide more short read data, but I am unsure if doing this will only give reads similar to before. Does anyone have experience with this? Is a second run likely to only sequence the same regions as the first time or can we expect to get data on previously unsequenced regions with an additional run?
B. Long-read. This will help join contigs into scaffolds and hopefully a full genome, but we will likely have very low coverage/inaccuracies for those areas of the genome that the miSeq missed.
Any recommendations on if A or B should be sufficient for a genome assembly given where we are at or will both be necessary? Thanks for the help!
Comment