SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
Denovo assembly problem huma Asif Illumina/Solexa 1 03-27-2013 10:20 PM
Segmentation fault (core dumped) at contig step during SOAP denovo assembly tangzhonghui Bioinformatics 1 10-09-2012 06:32 PM
Improving 454 assembly with Illumina clostridium40 454 Pyrosequencing 9 09-13-2011 09:17 AM
Improving de novo assembly Anelda Bioinformatics 4 09-12-2011 07:09 PM
Improving Illumina assembly with 454 reads? Linnea Bioinformatics 13 05-06-2011 07:06 PM

Reply
 
Thread Tools
Old 10-17-2011, 12:24 PM   #1
allthestairs
Junior Member
 
Location: Maryland

Join Date: Aug 2011
Posts: 2
Default Improving contig sizes for denovo GC-rich assembly

I've gotten two sets of data, we're sequencing a new bacterium, genome estimated to be 10-14mb, high GC-content (70-73%) I've inherited one run of ~20million 50bp reads from a GAIIx of very questionable quality with lots of adapter contamination. I've also got a run of about 40million 50bp paired end reads off the same machine with an insert size of 150-200bp. I believe both of these were made with the "Genomic DNA Sample Prep Kit" although the prep was done by the sequencing core, who have very poor communication skills. This should be a good bit of coverage, but after numerous attempts with all sorts of different settings and combinations of data, both raw and quality filtered with Velvet, Abyss, and Ray, the best assembly I've gotten in terms of contig size is just velvet with a fairly large kmer size. I get an N50 of 4200 after trimming all contigs <100bp with very few contigs over 10kb, but only 6.5mb total assembly. Using abyss I end up with an N50 of only 900, but 14mb of assembly.

We can now use a HiSeq 1000, and I was trying to decide what would be the best way to improve this assembly and try to get some much larger contains. I can reasonably do 100bp PE reads with the normal TruSeq kit, although I'm not sure how large of an insert size I can do without doing mate-pair. Trying to generate a mate-pair library is questionable because it costs half again as much to get just a 10-sample kit just for mate-pair that we might only use once or twice as it does to generate and sequence 12-48 samples, especially since I'm not sure that it would be the most efficient method of getting large contains. Can you mix insert sizes or mate-pair/paired end in one lane?

Any advice on what I should do next to get some much larger contigs?

Last edited by allthestairs; 10-17-2011 at 12:28 PM.
allthestairs is offline   Reply With Quote
Old 10-17-2011, 01:12 PM   #2
aloliveira
Member
 
Location: Brazil

Join Date: Aug 2010
Posts: 47
Default

Hi,

I think the problem with de novo assembly with high GC organisms is due the low number of non-unique kmers in your assembly process. You already try map your reads against a phylogenetically related bacteria? (maybe you may get more information than the de novo assembly).
aloliveira is offline   Reply With Quote
Old 10-17-2011, 01:50 PM   #3
allthestairs
Junior Member
 
Location: Maryland

Join Date: Aug 2011
Posts: 2
Default

Unfortunately we are sequencing this genome primarily for its novelty, looking for certain pathways that are unlikely to exist in any published genome. We could probably get some larger contigs for highly conserved regions of its genome, but anything that mapped well to an existing genome would be of little use to us.
allthestairs is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:13 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO