Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Improving contig sizes for denovo GC-rich assembly

    I've gotten two sets of data, we're sequencing a new bacterium, genome estimated to be 10-14mb, high GC-content (70-73%) I've inherited one run of ~20million 50bp reads from a GAIIx of very questionable quality with lots of adapter contamination. I've also got a run of about 40million 50bp paired end reads off the same machine with an insert size of 150-200bp. I believe both of these were made with the "Genomic DNA Sample Prep Kit" although the prep was done by the sequencing core, who have very poor communication skills. This should be a good bit of coverage, but after numerous attempts with all sorts of different settings and combinations of data, both raw and quality filtered with Velvet, Abyss, and Ray, the best assembly I've gotten in terms of contig size is just velvet with a fairly large kmer size. I get an N50 of 4200 after trimming all contigs <100bp with very few contigs over 10kb, but only 6.5mb total assembly. Using abyss I end up with an N50 of only 900, but 14mb of assembly.

    We can now use a HiSeq 1000, and I was trying to decide what would be the best way to improve this assembly and try to get some much larger contains. I can reasonably do 100bp PE reads with the normal TruSeq kit, although I'm not sure how large of an insert size I can do without doing mate-pair. Trying to generate a mate-pair library is questionable because it costs half again as much to get just a 10-sample kit just for mate-pair that we might only use once or twice as it does to generate and sequence 12-48 samples, especially since I'm not sure that it would be the most efficient method of getting large contains. Can you mix insert sizes or mate-pair/paired end in one lane?

    Any advice on what I should do next to get some much larger contigs?
    Last edited by allthestairs; 10-17-2011, 11:28 AM.

  • #2
    Hi,

    I think the problem with de novo assembly with high GC organisms is due the low number of non-unique kmers in your assembly process. You already try map your reads against a phylogenetically related bacteria? (maybe you may get more information than the de novo assembly).

    Comment


    • #3
      Unfortunately we are sequencing this genome primarily for its novelty, looking for certain pathways that are unlikely to exist in any published genome. We could probably get some larger contigs for highly conserved regions of its genome, but anything that mapped well to an existing genome would be of little use to us.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Advancing Precision Medicine for Rare Diseases in Children
        by seqadmin




        Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
        12-16-2024, 07:57 AM
      • seqadmin
        Recent Advances in Sequencing Technologies
        by seqadmin



        Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

        Long-Read Sequencing
        Long-read sequencing has seen remarkable advancements,...
        12-02-2024, 01:49 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 12-17-2024, 10:28 AM
      0 responses
      32 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 12-13-2024, 08:24 AM
      0 responses
      48 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 12-12-2024, 07:41 AM
      0 responses
      34 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 12-11-2024, 07:45 AM
      0 responses
      46 views
      0 likes
      Last Post seqadmin  
      Working...
      X