Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    @geschickten: I only know of ABySS, and as for parallelizing velvet, there was a post a little while ago on the velvet-users list by a Jeffrey Cook (http://www.jeffreycook.info/research) ...

    @beelu: according to my sys admin guy, they're Sun X4600M2 systems .. the ones with 8 processor board slots (and quad-core, with 8 RAM slots per board) ... Intel might be a viable option within months or next year .. you might check out the Nehalem processors.

    Comment


    • #17
      Jnfass,

      You say that you have done Velvet assemblies with > 100M reads (some paired-end) on a 512G machine; we know that Velvet is not a parallel assembler and you say that the Sun box ( I assume you run your assembly on the SUN machines you've mentioned) is multi-processor/core(s). Well my question is how are you or anybody for that matter use these non parallel software in a cluster or multi core/processor machines??? Do you know if all the 4/8 cores are being used by your software during assembly or it's just that you not using multicore machines!

      Comment


      • #18
        @geschickten: You're right - velvet isn't running parallel, either multi-threaded, or over MPI, or anything like that. So the number of processors is irrelevant. The total memory depends on the fact that there are eight 8G RAM sticks on each of 8 boards (I think), so 8^3 = 512G ...

        Comment


        • #19
          Originally posted by yvan.wenger View Post
          Hello,
          As an alternative, I am thinking to merge several assemblies, compare those that merge together if any, maybe keep a contig only if it appears in at least two different assemblies and so one... but everything needs to be done.
          yes actually i believe the best set of contigs are scattered all over the parameter space in several assemblies. not sure how to retrieve them.
          --
          Jeremy Leipzig
          Bioinformatics Programmer
          --
          My blog
          Twitter

          Comment


          • #20
            Hi all,
            I read this intresting topic. There are two main discussions the first is about the definitation of a proper tool able to assemble the trancriptome, the second is about the memory requirements when the data set is extremely big.

            I'm really curious about the first part.... why you say that assembly the transcriptome is different from assmebly genome? why the actual instruments like velvet fail in assmebly transcriptome?

            For what concerns the second part, I think that there is a general solution to this problem. From my experience and form what I have read in veltet user mailing list assemblers like velvet don't work well whith extremely large data sets. The trick usually is to work with a subset of 10% of the reads. Make multiple assemblyes of several random subsets and then merge toghether the results.

            The main reason to to that (in my opinion) is the fact that tools like velvet and abyss build a de bruijin graph that is based on the number of different k-mers present in the subset. Enourmous data sets imply the presentce of an high number of errors. The errors make the de bruijin graph sparse and this is the reason qhy we create thousands of little contigs.

            Best regards
            Francesco

            Comment


            • #21
              You might read this:


              Parallel short sequence assembly of transcriptomes
              Benjamin G Jackson, Patrick S Schnable, and Srinivas Aluru

              the tool is now named YAGA (yet another genome assembler) and in the current version handles not only transcriptomic data but also genomic data. The best run was on 1024 nodes (each node with only 512Mb memory available).
              The authors are trying to assemble the Mo17 line of maize (B73, sequenced clone-by-clone via sanger, just appeared online).

              Simone

              Comment


              • #22
                I tried working with this iterative approach as well, taking a small subset and assembly, followed by merging. But the results did not scale that easily, as a lot of reads from the random sampling were left unassembled!

                Did you happen to make much headway?

                sm

                Originally posted by francesco.vezzi View Post
                Hi all,
                I read this intresting topic. There are two main discussions the first is about the definitation of a proper tool able to assemble the trancriptome, the second is about the memory requirements when the data set is extremely big.

                I'm really curious about the first part.... why you say that assembly the transcriptome is different from assmebly genome? why the actual instruments like velvet fail in assmebly transcriptome?

                For what concerns the second part, I think that there is a general solution to this problem. From my experience and form what I have read in veltet user mailing list assemblers like velvet don't work well whith extremely large data sets. The trick usually is to work with a subset of 10% of the reads. Make multiple assemblyes of several random subsets and then merge toghether the results.

                The main reason to to that (in my opinion) is the fact that tools like velvet and abyss build a de bruijin graph that is based on the number of different k-mers present in the subset. Enourmous data sets imply the presentce of an high number of errors. The errors make the de bruijin graph sparse and this is the reason qhy we create thousands of little contigs.

                Best regards
                Francesco
                --
                bioinfosm

                Comment


                • #23
                  Until now I had good results using a subset of the generated reads. Your bad result can depend on several reasons.

                  I can try to suggest something like filter the low quality reads and trim the last bases of each read.
                  Another way can be to reuse the unassembled reads. You first generate a random set, you assemble it, then from the remaining reads you generate another random set and you put inside this all the unassembled reads of the first set.

                  I don't know if this works but is the only strategy that comes in my mind

                  Francesco


                  Originally posted by bioinfosm View Post
                  I tried working with this iterative approach as well, taking a small subset and assembly, followed by merging. But the results did not scale that easily, as a lot of reads from the random sampling were left unassembled!

                  Did you happen to make much headway?

                  sm

                  Comment


                  • #24
                    Hi,

                    This forum is reaaly interesting and I have the same doubts about the best assembler. I'm really inclined to use Velvet or ABySS, but I'm curious about MIRA. Have you heard or used this assembler?

                    The thing is that I have 454 and Sollexa reads. I'm planning to assemble each alone and to do an hybrid assembly. I've heard that MIRA is a good assembler for hybrid reads and is a good transcriptome assembler. However the manual is 190 pages long and, before I read that, I would like to hear the opinion of someone who have actualy used this assembler.

                    I'm starting to think that I should do the assemblies in all three assemblers and then compare the results...

                    Comment


                    • #25
                      Hi,
                      Mira is probably a good solution to your problem. Two month ago I attended a conference in which one of the speakers was the Mira's author.
                      The tool is really good the only bad side is the length of the manual!!!!

                      Comment


                      • #26
                        Newbler is very good for 454 assembly

                        You can try to feed those Newbler contigs into Velvet as though they were reference seqs along with your Illumina reads - perhaps the new Columbus module will work better than -longReads used to (very poorly)

                        MIRA seems to demand all these calibration files that the sequencing people usually throw away. Finding the way to turn off these demands requires a good bit of study.
                        --
                        Jeremy Leipzig
                        Bioinformatics Programmer
                        --
                        My blog
                        Twitter

                        Comment


                        • #27
                          Unfortunately I must rely on an OS assembler.

                          Comment


                          • #28
                            Originally posted by Zigster View Post
                            Newbler is very good for 454 assembly
                            MIRA seems to demand all these calibration files that the sequencing people usually throw away. Finding the way to turn off these demands requires a good bit of study.
                            Ummm ... no, you don't need these calibration files (whichever file you have in mind) as MIRA does not read them.

                            B.

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Essential Discoveries and Tools in Epitranscriptomics
                              by seqadmin




                              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                              04-22-2024, 07:01 AM
                            • seqadmin
                              Current Approaches to Protein Sequencing
                              by seqadmin


                              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                              04-04-2024, 04:25 PM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, Yesterday, 11:49 AM
                            0 responses
                            13 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-24-2024, 08:47 AM
                            0 responses
                            16 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-11-2024, 12:08 PM
                            0 responses
                            61 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-10-2024, 10:19 PM
                            0 responses
                            60 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X