Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Trinity Assembly

    Dear All..
    As a newbie in transcriptome analysis, I would like to ask a question about doing whole transcriptome assembly using trinity.

    Is it possible for us to get two different transcripts when we assemble our reads by either concatenating the reads or listed the reads (using comma separation as Trinity manual says)?. I am just not sure with the results that I got, It seems that I got different transcripts (can tell this from its size which is different) using these two different method in preparing my reads for the assembly using Trinity. Any thought what went wrong ?

    Cheers
    Didi

  • #2
    Trinity is non-deterministic thus some variation between runs of it are expected. Not a lot but some.

    Comment


    • #3
      Thanks for that westerman... Should I worry that the variation will also significantly be expressed when I construct the metrics for the transcripts evaluation?

      Comment


      • #4
        FYI:

        The Trinity stats that I got for the transcript that was built from concatenated data:
        ################################
        ## Counts of transcripts, etc.
        ################################
        Total trinity 'genes': 236322
        Total trinity transcripts: 518647
        Percent GC: 45.98

        ########################################
        Stats based on ALL transcript contigs:
        ########################################

        Contig N10: 8296
        Contig N20: 6856
        Contig N30: 5744
        Contig N40: 4826
        Contig N50: 4031

        Median contig length: 1217
        Average contig: 2100.35
        Total assembled bases: 1,089,337,664


        #####################################################
        ## Stats based on ONLY LONGEST ISOFORM per 'GENE':
        #####################################################

        Contig N10: 6119
        Contig N20: 4351
        Contig N30: 3248
        Contig N40: 2367
        Contig N50: 1635

        Median contig length: 367
        Average contig: 799.05
        Total assembled bases: 188,834,004

        The Trinity stats that I got for the transcript that was built from listing all of the reads using comma separation:

        ################################
        ## Counts of transcripts, etc.
        ################################
        Total trinity 'genes': 244,160
        Total trinity transcripts: 301,140
        Percent GC: 44.75

        ########################################
        Stats based on ALL transcript contigs:
        ########################################

        Contig N10: 6864
        Contig N20: 5185
        Contig N30: 4130
        Contig N40: 3303
        Contig N50: 2581

        Median contig length: 448
        Average contig: 1115.03
        Total assembled bases: 335,781,132


        #####################################################
        ## Stats based on ONLY LONGEST ISOFORM per 'GENE':
        #####################################################

        Contig N10: 5852
        Contig N20: 4184
        Contig N30: 3122
        Contig N40: 2305
        Contig N50: 1623

        Median contig length: 374
        Average contig: 806.19
        Total assembled bases: 196,840,230

        Comment


        • #5
          Those variations are more than I would expect and I can see why you are concerned. I'll see if I can fire up a recent Trinity assembly (I almost always use comma separated files) with combined reads and see what differences I get.

          Comment


          • #6
            Dear Bang_Didi,

            Did you make a decision which way is the best comma separation or combining?



            Originally posted by Bang_Didi View Post
            FYI:

            The Trinity stats that I got for the transcript that was built from concatenated data:
            ################################
            ## Counts of transcripts, etc.
            ################################
            Total trinity 'genes': 236322
            Total trinity transcripts: 518647
            Percent GC: 45.98

            ########################################
            Stats based on ALL transcript contigs:
            ########################################

            Contig N10: 8296
            Contig N20: 6856
            Contig N30: 5744
            Contig N40: 4826
            Contig N50: 4031

            Median contig length: 1217
            Average contig: 2100.35
            Total assembled bases: 1,089,337,664


            #####################################################
            ## Stats based on ONLY LONGEST ISOFORM per 'GENE':
            #####################################################

            Contig N10: 6119
            Contig N20: 4351
            Contig N30: 3248
            Contig N40: 2367
            Contig N50: 1635

            Median contig length: 367
            Average contig: 799.05
            Total assembled bases: 188,834,004

            The Trinity stats that I got for the transcript that was built from listing all of the reads using comma separation:

            ################################
            ## Counts of transcripts, etc.
            ################################
            Total trinity 'genes': 244,160
            Total trinity transcripts: 301,140
            Percent GC: 44.75

            ########################################
            Stats based on ALL transcript contigs:
            ########################################

            Contig N10: 6864
            Contig N20: 5185
            Contig N30: 4130
            Contig N40: 3303
            Contig N50: 2581

            Median contig length: 448
            Average contig: 1115.03
            Total assembled bases: 335,781,132


            #####################################################
            ## Stats based on ONLY LONGEST ISOFORM per 'GENE':
            #####################################################

            Contig N10: 5852
            Contig N20: 4184
            Contig N30: 3122
            Contig N40: 2305
            Contig N50: 1623

            Median contig length: 374
            Average contig: 806.19
            Total assembled bases: 196,840,230

            Comment


            • #7
              Greetings to all!

              I would like to know about the reads/kmers per transcripts. As the TrinityStats.pl tells the total assembled bases. contig length. no . of transcripts as longest isoform. So I would like to know about the difference between Trinity.fasta and single.fasta.
              When we execute the TrinityStats.pl , we know about the
              1. Stats based on ONLY LONGEST ISOFORM per 'GENE
              2.Stats based on ALL transcript contigs

              May i know that Trinity.fasta contains all transcripts or it has genes also. ?

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 03-27-2024, 06:37 PM
              0 responses
              12 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-27-2024, 06:07 PM
              0 responses
              11 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              53 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              69 views
              0 likes
              Last Post seqadmin  
              Working...
              X