Trinity assembly validation and statistics

Giorgio

Junior Member

Join Date: Oct 2010

Posts: 4
- Share
- Tweet
#1

Trinity assembly validation and statistics

04-15-2015, 12:21 PM

Hi All,

After one week Trinity finally completed an assembly starting with 800 million reads (an entire Next Seq 500 run). The statistics are weird, although there were tons of sequences, but I would like your opinion:

################################

## Counts of transcripts, etc.

################################

Total trinity 'genes': 858807

Total trinity transcripts: 924905

Percent GC: 40.20

########################################

Stats based on ALL transcript contigs:

########################################

Contig N10: 1739

Contig N20: 769

Contig N30: 490

Contig N40: 382

Contig N50: 324

Median contig length: 270

Average contig: 363.98

Total assembled bases: 336649200

#####################################################

## Stats based on ONLY LONGEST ISOFORM per 'GENE':

#####################################################

Contig N10: 1123

Contig N20: 575

Contig N30: 421

Contig N40: 349

Contig N50: 304

Median contig length: 268

Average contig: 341.45

Total assembled bases: 293239534

I used Trinity with default parameters and using --trimmomatic plus --min_kmer_cov 2. I really was expecting the N50 to be bigger. What can be the reason for that?

Note: Before starting the assembly I quality filtered the sequences and merged the results in two big paired end fasta files.

Please any advice can be precious!

Thanks!

Giorgio
Tags: None
sarika01

Junior Member

Join Date: May 2015

Posts: 2
- Share
- Tweet
#2

05-26-2015, 11:00 PM

hello I
I have used trinity for assembly..it give trinity.fatsa file and have 4785 sequences...but whn i run statics program it shown 5000 genes...pls help wht does it mean.
Comment

Previous template Next

Pathogen Surveillance with Advanced Genomic Tools

by seqadmin

The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
- Channel: Articles
03-24-2025, 11:48 AM
New Genomics Tools and Methods Shared at AGBT 2025

by seqadmin

This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25^th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

The Headliner
The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
- Channel: Articles
03-03-2025, 01:39 PM

Topics	Statistics	Last Post
New Software Simplifies 3D Gene Expression Mapping by seqadmin Started by seqadmin, Today, 10:17 AM	0 responses 6 views 0 reactions	Last Post by seqadmin Today, 10:17 AM
AI Tool Creates High-Resolution 3D Maps of the Mouse Brain by seqadmin Started by seqadmin, 03-20-2025, 05:03 AM	0 responses 49 views 0 reactions	Last Post by seqadmin 03-20-2025, 05:03 AM
Studying Microbial Gene Transfer with RNA Barcoding by seqadmin Started by seqadmin, 03-19-2025, 07:27 AM	0 responses 59 views 0 reactions	Last Post by seqadmin 03-19-2025, 07:27 AM
Mapping the snoRNAome in Zebrafish to Advance Disease Research by seqadmin Started by seqadmin, 03-18-2025, 12:50 PM	0 responses 50 views 0 reactions	Last Post by seqadmin 03-18-2025, 12:50 PM

Seqanswers Leaderboard Ad

Trinity assembly validation and statistics

Comment

Latest Articles

ad_right_rmr

News