Hi all,
I would like to sequence and de novo assemble a draft genome from a marine invertebrate with a roughly human-sized genome: 3.46 gigabases (as estimated by FIAD). I know nothing about repeat content, etc but I have a muscle tissue transcriptome.
My goal here is to generate a cost-effective draft assembly and gene models. I realize the assembly will not be spectacular with this approach but it will be used as preliminary data for a grant proposal that would involve additional approaches to improve the assembly. My tentative strategy is to do one whole run of a HiSeq X Ten: one lane with two different paired end libraries (one with partially overlapping reads and one with further apart reads as recommended by the allpaths-lg manual) and the other lane with one or two mate pair libraries. I will try assembly with allpaths-lg and meraculous.
My question is: would you expect a better assembly of these libraries if I have two or more different mate pair libraries (say one with a 3 kb insert and one with a 10 kb insert) or should I just go with one library with the longest insert size I can get? If the difference in genome assembly quality with 2+ different mate pair libraries would be negligible, I'm inclined to go with just one mate pair library and save money.
Any general advice on sequencing strategy for getting a the best draft genome I can for under USD $6,000 or so would be greatly appreciated.
Best,
Kevin
I would like to sequence and de novo assemble a draft genome from a marine invertebrate with a roughly human-sized genome: 3.46 gigabases (as estimated by FIAD). I know nothing about repeat content, etc but I have a muscle tissue transcriptome.
My goal here is to generate a cost-effective draft assembly and gene models. I realize the assembly will not be spectacular with this approach but it will be used as preliminary data for a grant proposal that would involve additional approaches to improve the assembly. My tentative strategy is to do one whole run of a HiSeq X Ten: one lane with two different paired end libraries (one with partially overlapping reads and one with further apart reads as recommended by the allpaths-lg manual) and the other lane with one or two mate pair libraries. I will try assembly with allpaths-lg and meraculous.
My question is: would you expect a better assembly of these libraries if I have two or more different mate pair libraries (say one with a 3 kb insert and one with a 10 kb insert) or should I just go with one library with the longest insert size I can get? If the difference in genome assembly quality with 2+ different mate pair libraries would be negligible, I'm inclined to go with just one mate pair library and save money.
Any general advice on sequencing strategy for getting a the best draft genome I can for under USD $6,000 or so would be greatly appreciated.
Best,
Kevin
Comment