Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Improving contig sizes for denovo GC-rich assembly

    I've gotten two sets of data, we're sequencing a new bacterium, genome estimated to be 10-14mb, high GC-content (70-73%) I've inherited one run of ~20million 50bp reads from a GAIIx of very questionable quality with lots of adapter contamination. I've also got a run of about 40million 50bp paired end reads off the same machine with an insert size of 150-200bp. I believe both of these were made with the "Genomic DNA Sample Prep Kit" although the prep was done by the sequencing core, who have very poor communication skills. This should be a good bit of coverage, but after numerous attempts with all sorts of different settings and combinations of data, both raw and quality filtered with Velvet, Abyss, and Ray, the best assembly I've gotten in terms of contig size is just velvet with a fairly large kmer size. I get an N50 of 4200 after trimming all contigs <100bp with very few contigs over 10kb, but only 6.5mb total assembly. Using abyss I end up with an N50 of only 900, but 14mb of assembly.

    We can now use a HiSeq 1000, and I was trying to decide what would be the best way to improve this assembly and try to get some much larger contains. I can reasonably do 100bp PE reads with the normal TruSeq kit, although I'm not sure how large of an insert size I can do without doing mate-pair. Trying to generate a mate-pair library is questionable because it costs half again as much to get just a 10-sample kit just for mate-pair that we might only use once or twice as it does to generate and sequence 12-48 samples, especially since I'm not sure that it would be the most efficient method of getting large contains. Can you mix insert sizes or mate-pair/paired end in one lane?

    Any advice on what I should do next to get some much larger contigs?
    Last edited by allthestairs; 10-17-2011, 11:28 AM.

  • #2
    Hi,

    I think the problem with de novo assembly with high GC organisms is due the low number of non-unique kmers in your assembly process. You already try map your reads against a phylogenetically related bacteria? (maybe you may get more information than the de novo assembly).

    Comment


    • #3
      Unfortunately we are sequencing this genome primarily for its novelty, looking for certain pathways that are unlikely to exist in any published genome. We could probably get some larger contigs for highly conserved regions of its genome, but anything that mapped well to an existing genome would be of little use to us.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      18 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      22 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      17 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      49 views
      0 likes
      Last Post seqadmin  
      Working...
      X