A recent blog post by FlxLex, commenting on work by Sergey Koren, Michael Schatz and others, indicates that genome assemblies can be significantly improved using corrected PacBio long reads:
Evidently reads are getting longer, and let's just say for the sake of argument that ONT comes up with the goods and we get:
reads over 100kb, very accurate and a mountain of them
I have three questions:
1. In the PacBio dataset the correction was processor intensive, but what do long reads mean for the memory requirements of de-novo assemblers? If you have very long reads does the algorithmic problem become more manageable without the need for 128-256GB of RAM?
2. Is anyone working on assemblers that will achieve this under the assumption that longer reads are inevitable, or will current tools work with minor modifications?
3. I'm kind of indirectly interested in regions that have a bit of transposable action, and repetitive regions more generally. If a lot of the missing data in current assemblies is due to these two factors then what length of good quality read would be likely to resolve the majority of them?
Perhaps a comparison of repetitive elements between the un-resolved fragments in the parrot assemblathon contigs and the corrected ones might give some clues?
I'm a bit of a novice in these issues and would be keen to hear the opinions of some experts! Perhaps this is looking too far forward, but the field seems to move very quickly!
Evidently reads are getting longer, and let's just say for the sake of argument that ONT comes up with the goods and we get:
reads over 100kb, very accurate and a mountain of them
I have three questions:
1. In the PacBio dataset the correction was processor intensive, but what do long reads mean for the memory requirements of de-novo assemblers? If you have very long reads does the algorithmic problem become more manageable without the need for 128-256GB of RAM?
2. Is anyone working on assemblers that will achieve this under the assumption that longer reads are inevitable, or will current tools work with minor modifications?
3. I'm kind of indirectly interested in regions that have a bit of transposable action, and repetitive regions more generally. If a lot of the missing data in current assemblies is due to these two factors then what length of good quality read would be likely to resolve the majority of them?
Perhaps a comparison of repetitive elements between the un-resolved fragments in the parrot assemblathon contigs and the corrected ones might give some clues?
I'm a bit of a novice in these issues and would be keen to hear the opinions of some experts! Perhaps this is looking too far forward, but the field seems to move very quickly!
Comment