Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to merge de novo transcriptome assemblies

    Hi all,

    I am working on building de novo transcriptome assemblies using Trinity. In the future, I would like to merge assemblies, so that I do not have to start my analysis from scratch each time I do more sequencing. (I am sequencing multiple stages of the organism I work on, and cannot do all the sequencing at once). Is there a way to merge assemblies from Trinity? Or to add to an existing assembly in Trinity, creating multiple iterations? My goal is to be able to combine de novo assemblies, or add to an existing assembly, without losing the original contigs (and therefore downstream analysis).

    Thanks for any advice!

  • #2
    I suppose you could run the iterative assembly using the previous assembly as a 'reference genome'. Not sure how well this would work. As far as I know Trinity is best run de-novo each time since you should discover new lowly expressed transcripts this way.

    Comment


    • #3
      Merge 2 de novo assembly generated using Trinity!

      Hi Criggs!

      I am also trying to merge assembly which were generated using trinity.
      After merging assembly, I found that there were 2 same transcipt id in merged assembly.
      Could you please tell me how you merge your assembly.

      I would really appreciate your input.

      Thanks in advance.
      naresh






      Originally posted by criggs View Post
      Hi all,

      I am working on building de novo transcriptome assemblies using Trinity. In the future, I would like to merge assemblies, so that I do not have to start my analysis from scratch each time I do more sequencing. (I am sequencing multiple stages of the organism I work on, and cannot do all the sequencing at once). Is there a way to merge assemblies from Trinity? Or to add to an existing assembly in Trinity, creating multiple iterations? My goal is to be able to combine de novo assemblies, or add to an existing assembly, without losing the original contigs (and therefore downstream analysis).

      Thanks for any advice!

      Comment


      • #4
        Using the previous assembly as a reference probably wouldn't be a great idea, because as far as I know Trinity will drop any sequences that can't be produced from the reference genome.

        I've been using minimus2 (from AMOS) to merge transcriptome assemblies, combining the merged contigs with singletons, but it's difficult to determine how good that merged assembly is.

        What I'd really like is a more generic "take these long sequences and generate consensus contigs" program, which would help for PacBio / MinION sequencing as well.

        Comment


        • #5
          I made a tool related to this, called Dedupe, available with BBTools. Unlike minimus, it does not merge overlapping contigs together; therefore it cannot not introduce misassemblies, but it also won't usually produce as small a combined assembly. In practice, we use it before or instead of minimus because it is much faster and more stable, able to handle very large assemblies that cause minimus to fail.

          Dedupe ensures that there is at most one copy of any input sequence, optionally allowing containments (substrings) to be removed, and a variable hamming or edit distance to be specified. Usage:

          dedupe.sh in=assembly1.fa,assembly2.fa out=merged.fa

          That will absorb exact duplicates and containments. You can use "hdist" and "edist" flags to allow mismatches, or get a complete list of flags by running the shellscript with no arguments.

          Comment


          • #6
            Hi Gringer,

            Thanks for your quick response.
            What I did was, used existing assembly as reference and whichever sequence's or read's were not matched to reference, I extracted those read and created new small assembly of unmapped read using trinity.

            And then finally merged both assembly for better coverage.


            Thanks,
            naresh


            Originally posted by gringer View Post
            Using the previous assembly as a reference probably wouldn't be a great idea, because as far as I know Trinity will drop any sequences that can't be produced from the reference genome.

            I've been using minimus2 (from AMOS) to merge transcriptome assemblies, combining the merged contigs with singletons, but it's difficult to determine how good that merged assembly is.

            What I'd really like is a more generic "take these long sequences and generate consensus contigs" program, which would help for PacBio / MinION sequencing as well.

            Comment


            • #7
              Thanks Brian for your prompt reply.

              Naresh


              Originally posted by Brian Bushnell View Post
              I made a tool related to this, called Dedupe, available with BBTools. Unlike minimus, it does not merge overlapping contigs together; therefore it cannot not introduce misassemblies, but it also won't usually produce as small a combined assembly. In practice, we use it before or instead of minimus because it is much faster and more stable, able to handle very large assemblies that cause minimus to fail.

              Dedupe ensures that there is at most one copy of any input sequence, optionally allowing containments (substrings) to be removed, and a variable hamming or edit distance to be specified. Usage:

              dedupe.sh in=assembly1.fa,assembly2.fa out=merged.fa

              That will absorb exact duplicates and containments. You can use "hdist" and "edist" flags to allow mismatches, or get a complete list of flags by running the shellscript with no arguments.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Advancing Precision Medicine for Rare Diseases in Children
                by seqadmin




                Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                12-16-2024, 07:57 AM
              • seqadmin
                Recent Advances in Sequencing Technologies
                by seqadmin



                Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                Long-Read Sequencing
                Long-read sequencing has seen remarkable advancements,...
                12-02-2024, 01:49 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 12-17-2024, 10:28 AM
              0 responses
              33 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-13-2024, 08:24 AM
              0 responses
              48 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-12-2024, 07:41 AM
              0 responses
              34 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-11-2024, 07:45 AM
              0 responses
              46 views
              0 likes
              Last Post seqadmin  
              Working...
              X