I've gotten two sets of data, we're sequencing a new bacterium, genome estimated to be 10-14mb, high GC-content (70-73%) I've inherited one run of ~20million 50bp reads from a GAIIx of very questionable quality with lots of adapter contamination. I've also got a run of about 40million 50bp paired end reads off the same machine with an insert size of 150-200bp. I believe both of these were made with the "Genomic DNA Sample Prep Kit" although the prep was done by the sequencing core, who have very poor communication skills. This should be a good bit of coverage, but after numerous attempts with all sorts of different settings and combinations of data, both raw and quality filtered with Velvet, Abyss, and Ray, the best assembly I've gotten in terms of contig size is just velvet with a fairly large kmer size. I get an N50 of 4200 after trimming all contigs <100bp with very few contigs over 10kb, but only 6.5mb total assembly. Using abyss I end up with an N50 of only 900, but 14mb of assembly.
We can now use a HiSeq 1000, and I was trying to decide what would be the best way to improve this assembly and try to get some much larger contains. I can reasonably do 100bp PE reads with the normal TruSeq kit, although I'm not sure how large of an insert size I can do without doing mate-pair. Trying to generate a mate-pair library is questionable because it costs half again as much to get just a 10-sample kit just for mate-pair that we might only use once or twice as it does to generate and sequence 12-48 samples, especially since I'm not sure that it would be the most efficient method of getting large contains. Can you mix insert sizes or mate-pair/paired end in one lane?
Any advice on what I should do next to get some much larger contigs?
We can now use a HiSeq 1000, and I was trying to decide what would be the best way to improve this assembly and try to get some much larger contains. I can reasonably do 100bp PE reads with the normal TruSeq kit, although I'm not sure how large of an insert size I can do without doing mate-pair. Trying to generate a mate-pair library is questionable because it costs half again as much to get just a 10-sample kit just for mate-pair that we might only use once or twice as it does to generate and sequence 12-48 samples, especially since I'm not sure that it would be the most efficient method of getting large contains. Can you mix insert sizes or mate-pair/paired end in one lane?
Any advice on what I should do next to get some much larger contigs?
Comment