Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • i got two different trinity.fasta by using two version of trinity

    hi guys,
    we have RNA-seq data sequenced of an insect in 2012, and assembled them by using one of the Trinity 2011 versions at the time (got the trinity.fasta) . now i analyzed the sequence length distribution in this file , and got the redult as follows:

    Code:
    kurban@kurban-X550VC:~/Downloads/bbmap$ sh stats.sh in=~/Downloads/gene.fa
    stats.sh: 52: stats.sh: Bad substitution
    stats.sh: 59: stats.sh: [[: not found
    stats.sh: 59: stats.sh: [[: not found
    stats.sh: 65: stats.sh: source: not found
    stats.sh: 66: stats.sh: parseXmx: not found
    A	C	G	T	N	IUPAC	Other	GC	GC_stdev
    0.2875	0.2118	0.2067	0.2940	0.0000	0.0000	0.0000	0.4186	0.0894
    
    Main genome scaffold total:         	144777
    Main genome contig total:           	144777
    Main genome scaffold sequence total:	67.067 MB
    Main genome contig sequence total:  	67.067 MB  	0.000% gap
    Main genome scaffold N/L50:         	15033/1.075 KB
    Main genome contig N/L50:           	15033/1.075 KB
    Max scaffold length:                	24.081 KB
    Max contig length:                  	24.081 KB
    Number of scaffolds > 50 KB:        	0
    % main genome in scaffolds > 50 KB: 	0.00%
    
    
    Minimum 	Number        	Number        	Total         	Total         	Scaffold
    Scaffold	of            	of            	Scaffold      	Contig        	Contig  
    Length  	Scaffolds     	Contigs       	Length        	Length        	Coverage
    --------	--------------	--------------	--------------	--------------	--------
        All 	       144,777	       144,777	    67,066,997	    67,066,997	 100.00%
        100 	       144,777	       144,777	    67,066,997	    67,066,997	 100.00%
        250 	        56,929	        56,929	    53,670,774	    53,670,774	 100.00%
        500 	        30,137	        30,137	    44,518,044	    44,518,044	 100.00%
       1 KB 	        16,207	        16,207	    34,757,505	    34,757,505	 100.00%
     2.5 KB 	         4,183	         4,183	    15,894,549	    15,894,549	 100.00%
       5 KB 	           588	           588	     3,942,668	     3,942,668	 100.00%
      10 KB 	            28	            28	       353,549	       353,549	 100.00%
    in the file the min seq. length is 101; the longest one is 22181.

    past several days i used the latest trinity version- trinityrnaseq-2.0.6, assembled the raw data once again(after low quality reads teamed of course). this time the length distribution of the file is :

    Code:
    kurban@kurban-X550VC:~/Downloads/bbmap$ sh stats.sh in=~/Desktop/data_from_server/2015_6_04_assembled_CD_and_CK/Trinity.fasta
    stats.sh: 52: stats.sh: Bad substitution
    stats.sh: 59: stats.sh: [[: not found
    stats.sh: 59: stats.sh: [[: not found
    stats.sh: 65: stats.sh: source: not found
    stats.sh: 66: stats.sh: parseXmx: not found
    A	C	G	T	N	IUPAC	Other	GC	GC_stdev
    0.2932	0.2083	0.2114	0.2871	0.0000	0.0000	0.0000	0.4197	0.0823
    
    Main genome scaffold total:         	56130
    Main genome contig total:           	56130
    Main genome scaffold sequence total:	57.963 MB
    Main genome contig sequence total:  	57.963 MB  	0.000% gap
    Main genome scaffold N/L50:         	9036/1.861 KB
    Main genome contig N/L50:           	9036/1.861 KB
    Max scaffold length:                	30.733 KB
    Max contig length:                  	30.733 KB
    Number of scaffolds > 50 KB:        	0
    % main genome in scaffolds > 50 KB: 	0.00%
    
    
    Minimum 	Number        	Number        	Total         	Total         	Scaffold
    Scaffold	of            	of            	Scaffold      	Contig        	Contig  
    Length  	Scaffolds     	Contigs       	Length        	Length        	Coverage
    --------	--------------	--------------	--------------	--------------	--------
        All 	        56,130	        56,130	    57,962,594	    57,962,594	 100.00%
        100 	        56,130	        56,130	    57,962,594	    57,962,594	 100.00%
        250 	        50,921	        50,921	    56,731,956	    56,731,956	 100.00%
        500 	        29,025	        29,025	    49,248,962	    49,248,962	 100.00%
       1 KB 	        18,003	        18,003	    41,494,038	    41,494,038	 100.00%
     2.5 KB 	         5,541	         5,541	    21,499,015	    21,499,015	 100.00%
       5 KB 	           900	           900	     5,895,754	     5,895,754	 100.00%
      10 KB 	            35	            35	       466,389	       466,389	 100.00%
      25 KB 	             1	             1	        30,733	        30,733	 100.00%
    in this second trinity.fasta file the min sequence length is 224; the longest one is 30733.

    my questions are :
    1. why two assembly results are different,e.g. the former version assembled lots of sequences in length range from 101 to ~200 ? but the minimum length of the assembled sequence by using latest version of trinity is 224?
    2. which trinity.fasta file should i use in the following analysis process ? why?

    could u please give me little bit detailed explanation ?!
    thanks.

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
59 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
57 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 09:21 AM
0 responses
53 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-04-2024, 09:00 AM
0 responses
56 views
0 likes
Last Post seqadmin  
Working...
X