Hi I have a genome of a wild plant subspecies in the form of about 200K scaffolds of various sizes from a few thousand bp to over 50kb. I am trying to assemble these into chromosomes using the the chromosome sequences of the nearest domesticated relative.
I am using nucmer from the mummer package (http://mummer.sourceforge.net/manual/) and trying to get the settings correct. I plan to use the tilings from nucmer -> show-tiling to construct the chromosome likely using biopython.
My questions are:
1) which settings for '-c' [min cluster] and '-l' [min match]? I tried default (-c 65 and -l 20) and this took over 90 hours before i tried different settings.
I have tried (150 50) (500 100) and (1000 50). There are still a lot of gaps in the alignment and unused scaffold sequences. I understand there are gaps and insertions, but there are places with many thousands of bp missing (over 100k in some places).
Is it safe (meaningful) to concatenate all scaffolds in order of the tiling from nucmer? Reverse complementing when needed of course.
2) Are there any other programs designed to construct chromosomes from scaffolds given a reference? This seems like a routine/common task but I have not found much information on this specific problem.
3) After constructing the new chromosome what is the best way to call SNPs?
Thanks!
I am using nucmer from the mummer package (http://mummer.sourceforge.net/manual/) and trying to get the settings correct. I plan to use the tilings from nucmer -> show-tiling to construct the chromosome likely using biopython.
My questions are:
1) which settings for '-c' [min cluster] and '-l' [min match]? I tried default (-c 65 and -l 20) and this took over 90 hours before i tried different settings.
I have tried (150 50) (500 100) and (1000 50). There are still a lot of gaps in the alignment and unused scaffold sequences. I understand there are gaps and insertions, but there are places with many thousands of bp missing (over 100k in some places).
Is it safe (meaningful) to concatenate all scaffolds in order of the tiling from nucmer? Reverse complementing when needed of course.
2) Are there any other programs designed to construct chromosomes from scaffolds given a reference? This seems like a routine/common task but I have not found much information on this specific problem.
3) After constructing the new chromosome what is the best way to call SNPs?
Thanks!
Comment