Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Improving Illumina assembly with 454 reads?

    Hi all,

    I'm working on a large genome assembly (~1Gbp) with Illumina paired-end reads, and currently I'm down to ~90 000 scaffolds (N50=26kb). Now I've got some additional 454 data (single end), and would like to use that for improving my assembly. I've heard about people assembling the two sets separately, and then try to merge them into one, and also people trying to do one big assembly with all reads.
    I would instead like to map the 454 reads onto my Illumina assembly, and
    see if I can get rid of NNNs in the scaffolds, or even link some scaffolds to each other. I tried the Roche GSReferenceMapper, and most reads mapped fully within scaffolds, but some are marked as "Chimeric". It seems like these reads map to more then one scaffold - possibly exactly what I'm looking for! But there seems to be no way to get the information on what scaffolds they map to (and to what positions) - I guess the software discards them as wrongly mapped?
    Does anyone more familiar to this software know if this information can be retrieved? Or is there a better software for this purpose?

    Any input would be appreciated!

  • #2
    I haven't used it extensively myself but Mira can apparently deal with 454 and Illumina reads at once.

    Wouldn't it be better to use the 454 first and use the Illumina to get the coverage depth up ? According to data I've seen (bacterial genomes) Illumina PE doesn't actually contribute that much to improving a 454 assembly - provided you have decent 454 coverage.

    Comment


    • #3
      Originally posted by colindaven View Post
      I haven't used it extensively myself but Mira can apparently deal with 454 and Illumina reads at once.

      Wouldn't it be better to use the 454 first and use the Illumina to get the coverage depth up ? According to data I've seen (bacterial genomes) Illumina PE doesn't actually contribute that much to improving a 454 assembly - provided you have decent 454 coverage.
      The problem is that I don't have a decent 454 coverage...
      The Illumina coverage is ~30X, while for the 454 I only have 1X (it's only a test run, and no more runs are planned.. When I assembled this data separately with newbler, I covered about one third of the genome). Hence the Illumina data must be the foundation of the assembly, and the 454 can only be used for improving. Unfortunately MIRA seems to be insufficient for this huge amount of data.

      Comment


      • #4
        Ok, this sounds tricky. I don't know how much information you'll gain, but you could try
        Mummer and Bambus, a scaffolder. Apparently Bambus can use output directly from Mummer, and Mummer is a good and fast aligner. I think you have to specify reference names though, which in your case with 90000 is going to be prohibitive. Perhaps another aligner like Novoalign might be effective - you'll have to see what works for you.

        A large number of contigs is the norm for eukaryotic projects - I think the Panda assembly is a good example.

        I don't know if this is accurate, but it is fairly astounding!

        A new generation of sequencing technologies is revolutionizing molecular biology. Illumina's Solexa and Applied Biosystems' SOLiD generate gigabases of nucleotide sequence per week. However, a perceived limitation of these ultra-high-throughput technologies is their short read-lengths. De novo assem …

        ""This has led to the generation of several draft genome sequences based exclusively on short sequence Illumina sequence reads, recently culminating in the assembly of the 2.25-Gb genome of the giant panda from Illumina sequence reads with an average length of just 52 nucleotides.""

        Comment


        • #5
          Have you tried using GapCloser from the Soap package? I'm not sure how well it will work on one direction 454 reads, but it works great with PE illumina data at resolving internal Ns in scaffolds.

          J

          Comment


          • #6
            Originally posted by themerlin View Post
            Have you tried using GapCloser from the Soap package? I'm not sure how well it will work on one direction 454 reads, but it works great with PE illumina data at resolving internal Ns in scaffolds.
            Could you provide GapCloser with 454 reads, even if the original assembly is done by illumina?? I have never tried it, but if this is possible I'll certainly give it a shot!

            Comment


            • #7
              I believe so. You can also map back your illumina reads to the scaffolds, which often resolves some internal Ns.

              J

              Comment


              • #8
                SOAP GapCloser not for 454 reads

                I just tested GapCloser on my 454 reads, and it failed saying "read max length should be less than 188bp". So actually it is not working with 454 reads at all (since most of them are much longer than 188bp). Too bad

                I'm also for the moment running it on the original Illumina data, hopefully this will work better!

                Comment


                • #9
                  If you can get your assembly in ace file format, perhaps you can use consed? It has a way to add 454 reads from an sff file.

                  Comment


                  • #10
                    Originally posted by flxlex View Post
                    If you can get your assembly in ace file format, perhaps you can use consed? It has a way to add 454 reads from an sff file.
                    So you can actually align 454 reads (that has not been aligned before) to an assembly with consed? I thought it was more for viewing. Sounds interesting! I don't have the assembly in ace, but it shouldn't be too hard to fix.
                    Do you know if consed can handle very large assemblys (>1Gb, made with 600,000,000 illumina reads)?

                    Comment


                    • #11
                      Originally posted by Linnea View Post
                      Do you know if consed can handle very large assemblys (>1Gb, made with 600,000,000 illumina reads)?
                      You never know, but I guess this could be tricky... Perhaps best to ask the phui mailing list?

                      Comment


                      • #12
                        Supply MIRA with assembled Illumina contigs

                        Hi,

                        MIRA is a great package and will give you the info you're looking for in terms of contig connections. It can't handle supplying raw Illumina reads but I've heard of people imputting assembled Illumina contigs...it has a size limitation on how large a contig it can handle but I think it's in the range of 50kb or so...you could then map the 454 reads to this. Haven't tried this but would be worth a shot. You might also want to post to the MIRA listserve...the author may have some useful suggestions.

                        Comment


                        • #13
                          hi,

                          i am having 454 sequncing data and i have got 6 contigs with that.

                          also i am having illumina solexa sequences....

                          Can i use this illumina data for gap filling of the 6 contigs..

                          i am using geneious software...is there any better plat form for the assembly

                          (p.S: when i assemble illumina reads,it gives me 361 contigs...)

                          please help me with this..

                          Comment


                          • #14
                            As reported in the paper below, the turkey ~ 1Gb was assembled using 454 and Illumina by feeding all the data into the Celera assembler. A nice assembly emerged. Having only 1x coverage by 454 is not a problem. It should help the Illumina coverage. For the turkey over 2% of the assembly was covered only by 454 and over 2% was covered only by Illumina. The Celera assembler is not easy to use.

                            Rami Dalloul, Julie Long, Aleksey Zimin, ... James Yorke, Liqing Zhang, Hong-Bin Zhang, Xiaojun Zhang, Yang Zhang, and Kent Reed;
                            Multi-platform Next Generation Sequencing of the Domestic Turkey (Meleagris gallopavo): Genome Assembly and Analysis,
                            PLoS Biology. Published Sept 7 2010

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Current Approaches to Protein Sequencing
                              by seqadmin


                              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                              04-04-2024, 04:25 PM
                            • seqadmin
                              Strategies for Sequencing Challenging Samples
                              by seqadmin


                              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                              03-22-2024, 06:39 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, 04-11-2024, 12:08 PM
                            0 responses
                            17 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-10-2024, 10:19 PM
                            0 responses
                            22 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-10-2024, 09:21 AM
                            0 responses
                            16 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-04-2024, 09:00 AM
                            0 responses
                            46 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X