Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Hybrid assembly using HiSeq and MiSeq data

    I'm trying to assemble a relatively large insect genome (~ 1.5 Gbp) and have sequencing data from two different sequencing platforms that I want to combine, in order to get the best possible assembly.

    More specifically, I have Illumina HiSeq data (2 x 100 bp), with insert size of 550 bp that give me around 40x coverage (from 4 libraries). Recently, I also sequenced one of these 550 bp libraries using the MiSeq platform (2 x 300 bp, overlapping reads). After merging of the mates I get "long" reads (most of them are >400 bp), with an estimated coverage of about 3x.

    So, what do you think is the best strategy for de novo assembly when you have sequencing data that differ that much in terms of read length and sequencing coverage?

    The reason I'm asking is because I think that pooling all reads together and trying to assemble using a kmer-based assembler will "confuse" the assembler because of the difference in sequencing coverage. Moreover, I'm also guessing that I'm not really making the most out of my longer MiSeq reads, if I use a kmer-based assembler.

    Do you think an alternative would be to assemble the HiSeq and MiSeq data separately and then combine them using an OLC (overlap-layout-consensus) assembler (instead of kmer-based one)? If so, is there such an assembler that is particularly good at this task?

    Thanks!

  • #2
    Combining reads into one set of files would not be a good idea. However assemblers such as ABySS will happily take two or more sets of files and treat each one as separate entities.

    I am not saying that ABySS is the best assembler for your work -- although it is my 'go-to' assembler for large projects -- but do suggest giving it a try. In your case I would tell ABySS that I had 4 different paired-end libraries (the HiSeq data) and a single-end library (the MiSeq merged reads).

    As an answer to your final paragraph, "Do you think an alternative would be to assemble the HiSeq and MiSeq data separately and then combine them using an OLC ...", yeah, that should work as well. minimus/bambus would be what I would use. Not sure if they are 'best' though.

    Comment


    • #3
      I would check out MaSuRCA. As input, you would give it the raw reads, not trimmed or stitched together. Each read set would be a unique library.

      I have done this with a few different genomes with varying success. When I had better success, it was generally not by sequencing a single library with longer reads (overlapped or not), it was by sequencing a new library with longer reads. My opinion on why is because you end up averaging out library prep biases when you have more libraries.

      In the end, you will probably find that you still need a completely different data type to get to a decent assembly. MP, long (>1k) reads, targeted high depth, etc.

      Comment


      • #4
        Thank you both guys for the hints!

        westerman, I'll give ABySS a go and see what I get.

        bioBob, I had tried MaSuRCA about a year ago, but was really disappointed (very, very buggy!). I heard though there's a new version that has lots of bug fixes. I think I'll also give that a try.

        Comment


        • #5
          Let us know about MaSuRCA. The one time I tried it I was disappointed as well.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Recent Advances in Sequencing Analysis Tools
            by seqadmin


            The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
            Yesterday, 07:48 AM
          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Today, 06:57 AM
          0 responses
          9 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Yesterday, 07:17 AM
          0 responses
          14 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 05-02-2024, 08:06 AM
          0 responses
          19 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-30-2024, 12:17 PM
          0 responses
          23 views
          0 likes
          Last Post seqadmin  
          Working...
          X