Hi,
I'm trying to obtain the best de novo transcriptome assembly for my data. I pooled together all the reads from my different samples and individuals and filtered the data using Trimmomatic (and removed Illumina adaptors). The reads are 150bp and only reads longer than 100bp were used.
general stats
Total trinity transcripts: 1295606
Total trinity components: 776170
Contig N50: 981
SAM_nameSorted_to_uniq_count_stats.pl
#read_type count pct
proper_pairs 119877694 44.02
improper_pairs 111202610 40.84
left_only 25080118 9.21
right_only 16146667 5.93
Total aligned reads: 272307089
My question is, which reasons could be behind obtaining such a high number of improper pairs aligned and how could we improve our assembly?
I'm trying to obtain the best de novo transcriptome assembly for my data. I pooled together all the reads from my different samples and individuals and filtered the data using Trimmomatic (and removed Illumina adaptors). The reads are 150bp and only reads longer than 100bp were used.
general stats
Total trinity transcripts: 1295606
Total trinity components: 776170
Contig N50: 981
SAM_nameSorted_to_uniq_count_stats.pl
#read_type count pct
proper_pairs 119877694 44.02
improper_pairs 111202610 40.84
left_only 25080118 9.21
right_only 16146667 5.93
Total aligned reads: 272307089
My question is, which reasons could be behind obtaining such a high number of improper pairs aligned and how could we improve our assembly?