I have assembled a ~5Mb bacterial genome using both Edena and CLC Bio. I have about several hundred contigs.
The lab is willing to sequence to close up some of the gaps - about 50...but not 500. This is a 50 bps reads single-end illumina sequencing (about 200X) DNA-Seq library.
What is the best approach to try and close some of the gaps?
Are there specific tools to automate this?
Can the singleton sequences be helpful?
There is a genome of a cousin strain that has been sequenced (90-95% similar). Using Mummer, I am able to map the contigs to them, but they have found experimentally that there that there are differences between the strains and I'm not sure where to stick in the contigs that are different.
Also, how do I handle repetitive sequences. There are some contigs that have much higher coverage than the others > 2000X.
Again....are there tools or pipelines that can help reduce the number of contigs by merging some together? Any tips on how to proceed?
Thanks so much,
Tirza
The lab is willing to sequence to close up some of the gaps - about 50...but not 500. This is a 50 bps reads single-end illumina sequencing (about 200X) DNA-Seq library.
What is the best approach to try and close some of the gaps?
Are there specific tools to automate this?
Can the singleton sequences be helpful?
There is a genome of a cousin strain that has been sequenced (90-95% similar). Using Mummer, I am able to map the contigs to them, but they have found experimentally that there that there are differences between the strains and I'm not sure where to stick in the contigs that are different.
Also, how do I handle repetitive sequences. There are some contigs that have much higher coverage than the others > 2000X.
Again....are there tools or pipelines that can help reduce the number of contigs by merging some together? Any tips on how to proceed?
Thanks so much,
Tirza
Comment