i got two different trinity.fasta by using two version of trinity

kurban910

Member

Join Date: Jul 2014
Posts: 58

i got two different trinity.fasta by using two version of trinity

06-05-2015, 11:44 AM

hi guys,
we have RNA-seq data sequenced of an insect in 2012, and assembled them by using one of the Trinity 2011 versions at the time (got the trinity.fasta) . now i analyzed the sequence length distribution in this file , and got the redult as follows:

Code:

kurban@kurban-X550VC:~/Downloads/bbmap$ sh stats.sh in=~/Downloads/gene.fa
stats.sh: 52: stats.sh: Bad substitution
stats.sh: 59: stats.sh: [[: not found
stats.sh: 59: stats.sh: [[: not found
stats.sh: 65: stats.sh: source: not found
stats.sh: 66: stats.sh: parseXmx: not found
A	C	G	T	N	IUPAC	Other	GC	GC_stdev
0.2875	0.2118	0.2067	0.2940	0.0000	0.0000	0.0000	0.4186	0.0894

Main genome scaffold total:         	144777
Main genome contig total:           	144777
Main genome scaffold sequence total:	67.067 MB
Main genome contig sequence total:  	67.067 MB  	0.000% gap
Main genome scaffold N/L50:         	15033/1.075 KB
Main genome contig N/L50:           	15033/1.075 KB
Max scaffold length:                	24.081 KB
Max contig length:                  	24.081 KB
Number of scaffolds > 50 KB:        	0
% main genome in scaffolds > 50 KB: 	0.00%


Minimum 	Number        	Number        	Total         	Total         	Scaffold
Scaffold	of            	of            	Scaffold      	Contig        	Contig  
Length  	Scaffolds     	Contigs       	Length        	Length        	Coverage
--------	--------------	--------------	--------------	--------------	--------
    All 	       144,777	       144,777	    67,066,997	    67,066,997	 100.00%
    100 	       144,777	       144,777	    67,066,997	    67,066,997	 100.00%
    250 	        56,929	        56,929	    53,670,774	    53,670,774	 100.00%
    500 	        30,137	        30,137	    44,518,044	    44,518,044	 100.00%
   1 KB 	        16,207	        16,207	    34,757,505	    34,757,505	 100.00%
 2.5 KB 	         4,183	         4,183	    15,894,549	    15,894,549	 100.00%
   5 KB 	           588	           588	     3,942,668	     3,942,668	 100.00%
  10 KB 	            28	            28	       353,549	       353,549	 100.00%

in the file the min seq. length is 101; the longest one is 22181.

past several days i used the latest trinity version- trinityrnaseq-2.0.6, assembled the raw data once again(after low quality reads teamed of course). this time the length distribution of the file is :

Code:

kurban@kurban-X550VC:~/Downloads/bbmap$ sh stats.sh in=~/Desktop/data_from_server/2015_6_04_assembled_CD_and_CK/Trinity.fasta
stats.sh: 52: stats.sh: Bad substitution
stats.sh: 59: stats.sh: [[: not found
stats.sh: 59: stats.sh: [[: not found
stats.sh: 65: stats.sh: source: not found
stats.sh: 66: stats.sh: parseXmx: not found
A	C	G	T	N	IUPAC	Other	GC	GC_stdev
0.2932	0.2083	0.2114	0.2871	0.0000	0.0000	0.0000	0.4197	0.0823

Main genome scaffold total:         	56130
Main genome contig total:           	56130
Main genome scaffold sequence total:	57.963 MB
Main genome contig sequence total:  	57.963 MB  	0.000% gap
Main genome scaffold N/L50:         	9036/1.861 KB
Main genome contig N/L50:           	9036/1.861 KB
Max scaffold length:                	30.733 KB
Max contig length:                  	30.733 KB
Number of scaffolds > 50 KB:        	0
% main genome in scaffolds > 50 KB: 	0.00%


Minimum 	Number        	Number        	Total         	Total         	Scaffold
Scaffold	of            	of            	Scaffold      	Contig        	Contig  
Length  	Scaffolds     	Contigs       	Length        	Length        	Coverage
--------	--------------	--------------	--------------	--------------	--------
    All 	        56,130	        56,130	    57,962,594	    57,962,594	 100.00%
    100 	        56,130	        56,130	    57,962,594	    57,962,594	 100.00%
    250 	        50,921	        50,921	    56,731,956	    56,731,956	 100.00%
    500 	        29,025	        29,025	    49,248,962	    49,248,962	 100.00%
   1 KB 	        18,003	        18,003	    41,494,038	    41,494,038	 100.00%
 2.5 KB 	         5,541	         5,541	    21,499,015	    21,499,015	 100.00%
   5 KB 	           900	           900	     5,895,754	     5,895,754	 100.00%
  10 KB 	            35	            35	       466,389	       466,389	 100.00%
  25 KB 	             1	             1	        30,733	        30,733	 100.00%

in this second trinity.fasta file the min sequence length is 224; the longest one is 30733.

my questions are :
1. why two assembly results are different,e.g. the former version assembled lots of sequences in length range from 101 to ~200 ? but the minimum length of the assembled sequence by using latest version of trinity is 224?
2. which trinity.fasta file should i use in the following analysis process ? why?

could u please give me little bit detailed explanation ?!
thanks.

Tags: None

Previous template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 57 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 56 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

i got two different trinity.fasta by using two version of trinity

Latest Articles

ad_right_rmr

News