View Single Post
Old 01-12-2012, 03:37 PM   #3
Senior Member
Location: Sydney

Join Date: Feb 2011
Posts: 149

Originally Posted by RogerH View Post
I recently aligned the transcriptomes of 5 different algal species, using oases.

I found that I get higher N50 values and maximal contig lengths if I use the untrimmed data. Furthermore, even though the percentage of reads used is higher if I use trimmed data, overall with the massive reduction in data if I trim it, I use more of my reads if I assemble the transcriptomes with untrimmed data.

I know one side effect of sequencing errors is a higher RAM requirement, but besides that, is there any other negative (or positive) effect if I use untrimmed data for my assembly?

RAM wasn't really an issue for me, since I had access to a high performance computer with several nodes with 64 GB RAM each.

I'm not sure what type of reads you are using, but if you are using Illumina reads, you should always trim off the first 12 to 15 bases, as it presents substantial biases.
Did you do a FastQC quality check?
If you see some severe biases in the 5' end, you should trim this off. I also trim off some 3' end bases, depending on whether the quality of the reads falls off dramatically. In addition I filter out reads containing even one base that drops below a certain Q score.

If you use untrimmed reads, while you may get more contigs from this it will be quite unreliable due to misassemblies and possibly chimeras.

Kennels is offline   Reply With Quote