Hi All,
After one week Trinity finally completed an assembly starting with 800 million reads (an entire Next Seq 500 run). The statistics are weird, although there were tons of sequences, but I would like your opinion:
################################
## Counts of transcripts, etc.
################################
Total trinity 'genes': 858807
Total trinity transcripts: 924905
Percent GC: 40.20
########################################
Stats based on ALL transcript contigs:
########################################
Contig N10: 1739
Contig N20: 769
Contig N30: 490
Contig N40: 382
Contig N50: 324
Median contig length: 270
Average contig: 363.98
Total assembled bases: 336649200
#####################################################
## Stats based on ONLY LONGEST ISOFORM per 'GENE':
#####################################################
Contig N10: 1123
Contig N20: 575
Contig N30: 421
Contig N40: 349
Contig N50: 304
Median contig length: 268
Average contig: 341.45
Total assembled bases: 293239534
I used Trinity with default parameters and using --trimmomatic plus --min_kmer_cov 2. I really was expecting the N50 to be bigger. What can be the reason for that?
Note: Before starting the assembly I quality filtered the sequences and merged the results in two big paired end fasta files.
Please any advice can be precious!
Thanks!
Giorgio
After one week Trinity finally completed an assembly starting with 800 million reads (an entire Next Seq 500 run). The statistics are weird, although there were tons of sequences, but I would like your opinion:
################################
## Counts of transcripts, etc.
################################
Total trinity 'genes': 858807
Total trinity transcripts: 924905
Percent GC: 40.20
########################################
Stats based on ALL transcript contigs:
########################################
Contig N10: 1739
Contig N20: 769
Contig N30: 490
Contig N40: 382
Contig N50: 324
Median contig length: 270
Average contig: 363.98
Total assembled bases: 336649200
#####################################################
## Stats based on ONLY LONGEST ISOFORM per 'GENE':
#####################################################
Contig N10: 1123
Contig N20: 575
Contig N30: 421
Contig N40: 349
Contig N50: 304
Median contig length: 268
Average contig: 341.45
Total assembled bases: 293239534
I used Trinity with default parameters and using --trimmomatic plus --min_kmer_cov 2. I really was expecting the N50 to be bigger. What can be the reason for that?
Note: Before starting the assembly I quality filtered the sequences and merged the results in two big paired end fasta files.
Please any advice can be precious!
Thanks!
Giorgio
Comment