Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • VELVET or ABYSS for Transcriptome

    We are planning to use ABYSS and Velvet for de novo assembly on transcriptome data. Just wondering if the group can share their experience with either tool; also how does both compare? and which is the best tool available for the assembly of transcriptome data? Thank you...

  • #2
    De novo transcriptome assembly?

    Hello,

    I am wondering if you had any reply from your question concerning the best tool for the assembly of transcriptome... I am up to evaluate the tools but it seems that our draft genome gives an advantage to assemblies leading to short contigs as it has roughly 130'000 contigs (genomic then). As a consequence, the assembly with the best mapping to the genome is one with short contigs (otherwise large assembled contigs would jump from one genomic contig to another because those are quite shorts).

    As N50 does not seem to be a good metric for transcriptomes, I was wondering what other measures/manip to use to rank the different assemblies. Also, I noted that both correct and wrong contigs can be found in all assemblies and that they are often different (you can find a correct contig that is only represented in a rather "bad" assembly for example). Given this, I am wondering if somebody in this forum as seen data on alternative methods to obtain good contigs without a good genome? I for instance just re-had a look on the Abyss paper (De novo Transcriptome Assembly with Abyss, Birol et al, Bioinformatics Advance Access published June 15 2009) and see there that they still assess their transcritpome assembly using the human genome. As an alternative, I am thinking to merge several assemblies, compare those that merge together if any, maybe keep a contig only if it appears in at least two different assemblies and so one... but everything needs to be done.

    Any thoughts on all that? Or otherwise, is there a forum dedicated to this topic?

    Best,

    Yvan

    Original message:
    We are planning to use ABYSS and Velvet for de novo assembly on transcriptome data. Just wondering if the group can share their experience with either tool; also how does both compare? and which is the best tool available for the assembly of transcriptome data? Thank you...

    Comment


    • #3
      Yvan,

      Well I haven't received any replies from the forum. I must admit I am new to this world of genomics and hence I may not be able to pass my comments on your observation.

      I have not come across any forum dedicated to this topic.

      Do let me know about your evaluation and if required we can even take this offline...

      Comment


      • #4
        all,

        I believe that the short answer is: The proper tools are not publicly available yet. There is a wrong way to do this: assembling transcriptome data like it's genomic, and a right way: yet to be determined. I'm looking for pretty much the same thing and I can't seem to find it. The primary problem with assembling transcriptome data like it's genomic is that most transcriptome data sets have some genomic contamination, and they have alternative splicings. Both of these facts run counter to the assumptions of the genome assemblers, in which there is no alternative splicing (or at most two haplotype alternatives). If anyone is thinking about working on new assemblers for these new data sets PM me; I'm very interested in exploring the topic and maybe sitting down to write one.

        Cheers,
        --Will

        Comment


        • #5
          Will,

          I am willing to work on this and if you are okay then we can work together to design/develop an assembler for transcriptome data!

          prahalad

          Comment


          • #6
            I have used velvet and ABySS to assembly genomic sequences from Illumina reads. However velvet runs very slow and can not process 36507944 reads X36 nt + 95398944 reads X 76 nt on 32 G memory computer, it stoped due to memory problem. I don't know how to solve it.

            From the paper De novo Transcriptome Assembly with Abyss, Birol et al, ABySS could assemble shotgun + pairedend runs together. I am wondering how it works. In the manual of ABySS, it only shows to assemble shotgun run and paired end run separatly.

            I would like to hear from others about them

            Comment


            • #7
              NST, do you think you can share this papaer "De novo Transcriptome Assembly with Abyss, Birol et al"

              -p

              Comment


              • #8
                Hi NSTbioinformatics,

                If you post the details of your problem on the abyss-users mailing list (http://www.bcgsc.ca/mailman/listinfo/abyss-users) Shaun Jackman or I can help you set up abyss for your data set. You will be able to assemble both single-end and paired-end reads in the same run but some care must be taken when choosing the assembly parameters.

                Regards,
                Jared Simpson

                Comment


                • #9
                  Here are the references for Abyss:

                  Simpson et al. ABySS: A parallel assembler for short read sequence data. Genome Res (2009) vol. 19 (6) pp. 1117-23

                  Birol et al. De novo Transcriptome Assembly with ABySS. Bioinformatics (2009) pp.

                  Comment


                  • #10
                    Originally posted by NSTbioinformatics View Post
                    I have used velvet and ABySS to assembly genomic sequences from Illumina reads. However velvet runs very slow and can not process 36507944 reads X36 nt + 95398944 reads X 76 nt on 32 G memory computer, it stoped due to memory problem. I don't know how to solve it.

                    From the paper De novo Transcriptome Assembly with Abyss, Birol et al, ABySS could assemble shotgun + pairedend runs together. I am wondering how it works. In the manual of ABySS, it only shows to assemble shotgun run and paired end run separatly.

                    I would like to hear from others about them
                    I've done velvet assemblies with > 100M reads (some paired-end) on a 512G machine ... yes, it does take a lot of memory ... but I'd be interested in hearing if ABySS is any better. My understanding is that these assemblers like to have the whole assembly graph in memory at once, and that's the roadblock to assembling in smaller RAM spaces (though, I've seen a few comments from people working on parallelizing one or the other program).

                    Before I had access to a large memory machine, I ran the single ended assembly first, then used those contigs as "long" reads to add to an assembly of the paired reads.

                    Velvet can definitely do single and paired reads together, and if you change a parameter before compiling, you can have an unlimited number of different paired read sets, each with different insert lengths.

                    Comment


                    • #11
                      Originally posted by NSTbioinformatics View Post
                      However velvet runs very slow and can not process 36507944 reads X36 nt + 95398944 reads X 76 nt on 32 G memory computer, it stoped due to memory problem. I don't know how to solve it.
                      I have had better luck with Velvet running at longer kmers and to a lesser extent higher coverage cutoffs. Apparently this is counter-intuitive given that there are 16x possible kmers of length 31 than say 29, but velvetg is much more likely to hit the wall at the shorter kmers.

                      I recently did a de novo transcriptome assembly of 100,425,440 72bp paired end reads totaling over 7,034,311,658 bp on a 256G machine but could not get below kmer 29 without crashing.

                      Fortunately velvet now accepts very large kmer lengths, so I would try those before giving up.
                      --
                      Jeremy Leipzig
                      Bioinformatics Programmer
                      --
                      My blog
                      Twitter

                      Comment


                      • #12
                        Hi jnfass and Zigster, how do you build your machine to 512G/256G? How many CPU do you have and whats your RAM to core ratio? Thanks.

                        Beelu

                        Comment


                        • #13
                          We use a Dell Poweredge something-or-other with 4 X7350 (16 cores total)
                          --
                          Jeremy Leipzig
                          Bioinformatics Programmer
                          --
                          My blog
                          Twitter

                          Comment


                          • #14
                            Originally posted by jnfass View Post
                            I've done velvet assemblies with > 100M reads (some paired-end) on a 512G machine ... yes, it does take a lot of memory ... but I'd be interested in hearing if ABySS is any better. My understanding is that these assemblers like to have the whole assembly graph in memory at once, and that's the roadblock to assembling in smaller RAM spaces (though, I've seen a few comments from people working on parallelizing one or the other program).

                            Before I had access to a large memory machine, I ran the single ended assembly first, then used those contigs as "long" reads to add to an assembly of the paired reads.

                            Velvet can definitely do single and paired reads together, and if you change a parameter before compiling, you can have an unlimited number of different paired read sets, each with different insert lengths.

                            Hi jnfass,

                            Can you please share some information on who's doing the work on parallelizing assemblers? Also kindly point to some good open source parallel assemblers if you know any.. thank you

                            Comment


                            • #15
                              Originally posted by Zigster View Post
                              I have had better luck with Velvet running at longer kmers and to a lesser extent higher coverage cutoffs. Apparently this is counter-intuitive given that there are 16x possible kmers of length 31 than say 29, but velvetg is much more likely to hit the wall at the shorter kmers.

                              I recently did a de novo transcriptome assembly of 100,425,440 72bp paired end reads totaling over 7,034,311,658 bp on a 256G machine but could not get below kmer 29 without crashing.

                              Fortunately velvet now accepts very large kmer lengths, so I would try those before giving up.
                              Hi Zigster,

                              Can you please share the exact configuration of the machine that you used to for this run. Also what's your take on if somebody allows you to run this in Cloud?? would you go for it?

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin


                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                                Yesterday, 07:01 AM
                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              39 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              41 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              35 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              55 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X