Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to merge de novo transcriptome assemblies

    Hi all,

    I am working on building de novo transcriptome assemblies using Trinity. In the future, I would like to merge assemblies, so that I do not have to start my analysis from scratch each time I do more sequencing. (I am sequencing multiple stages of the organism I work on, and cannot do all the sequencing at once). Is there a way to merge assemblies from Trinity? Or to add to an existing assembly in Trinity, creating multiple iterations? My goal is to be able to combine de novo assemblies, or add to an existing assembly, without losing the original contigs (and therefore downstream analysis).

    Thanks for any advice!

  • #2
    I suppose you could run the iterative assembly using the previous assembly as a 'reference genome'. Not sure how well this would work. As far as I know Trinity is best run de-novo each time since you should discover new lowly expressed transcripts this way.

    Comment


    • #3
      Merge 2 de novo assembly generated using Trinity!

      Hi Criggs!

      I am also trying to merge assembly which were generated using trinity.
      After merging assembly, I found that there were 2 same transcipt id in merged assembly.
      Could you please tell me how you merge your assembly.

      I would really appreciate your input.

      Thanks in advance.
      naresh






      Originally posted by criggs View Post
      Hi all,

      I am working on building de novo transcriptome assemblies using Trinity. In the future, I would like to merge assemblies, so that I do not have to start my analysis from scratch each time I do more sequencing. (I am sequencing multiple stages of the organism I work on, and cannot do all the sequencing at once). Is there a way to merge assemblies from Trinity? Or to add to an existing assembly in Trinity, creating multiple iterations? My goal is to be able to combine de novo assemblies, or add to an existing assembly, without losing the original contigs (and therefore downstream analysis).

      Thanks for any advice!

      Comment


      • #4
        Using the previous assembly as a reference probably wouldn't be a great idea, because as far as I know Trinity will drop any sequences that can't be produced from the reference genome.

        I've been using minimus2 (from AMOS) to merge transcriptome assemblies, combining the merged contigs with singletons, but it's difficult to determine how good that merged assembly is.

        What I'd really like is a more generic "take these long sequences and generate consensus contigs" program, which would help for PacBio / MinION sequencing as well.

        Comment


        • #5
          I made a tool related to this, called Dedupe, available with BBTools. Unlike minimus, it does not merge overlapping contigs together; therefore it cannot not introduce misassemblies, but it also won't usually produce as small a combined assembly. In practice, we use it before or instead of minimus because it is much faster and more stable, able to handle very large assemblies that cause minimus to fail.

          Dedupe ensures that there is at most one copy of any input sequence, optionally allowing containments (substrings) to be removed, and a variable hamming or edit distance to be specified. Usage:

          dedupe.sh in=assembly1.fa,assembly2.fa out=merged.fa

          That will absorb exact duplicates and containments. You can use "hdist" and "edist" flags to allow mismatches, or get a complete list of flags by running the shellscript with no arguments.

          Comment


          • #6
            Hi Gringer,

            Thanks for your quick response.
            What I did was, used existing assembly as reference and whichever sequence's or read's were not matched to reference, I extracted those read and created new small assembly of unmapped read using trinity.

            And then finally merged both assembly for better coverage.


            Thanks,
            naresh


            Originally posted by gringer View Post
            Using the previous assembly as a reference probably wouldn't be a great idea, because as far as I know Trinity will drop any sequences that can't be produced from the reference genome.

            I've been using minimus2 (from AMOS) to merge transcriptome assemblies, combining the merged contigs with singletons, but it's difficult to determine how good that merged assembly is.

            What I'd really like is a more generic "take these long sequences and generate consensus contigs" program, which would help for PacBio / MinION sequencing as well.

            Comment


            • #7
              Thanks Brian for your prompt reply.

              Naresh


              Originally posted by Brian Bushnell View Post
              I made a tool related to this, called Dedupe, available with BBTools. Unlike minimus, it does not merge overlapping contigs together; therefore it cannot not introduce misassemblies, but it also won't usually produce as small a combined assembly. In practice, we use it before or instead of minimus because it is much faster and more stable, able to handle very large assemblies that cause minimus to fail.

              Dedupe ensures that there is at most one copy of any input sequence, optionally allowing containments (substrings) to be removed, and a variable hamming or edit distance to be specified. Usage:

              dedupe.sh in=assembly1.fa,assembly2.fa out=merged.fa

              That will absorb exact duplicates and containments. You can use "hdist" and "edist" flags to allow mismatches, or get a complete list of flags by running the shellscript with no arguments.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              8 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              8 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              49 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              67 views
              0 likes
              Last Post seqadmin  
              Working...
              X