We've all seen how the sequencing errors tend to found toward the 3' end of the read. We taken the task up of trimming reads for a highly over-covered study to 25, 30, 35... 50b. and mapping using NextGene under 1, 2, 3 etc mismatches.
The results will floor you.
When we map to (human) reference using all 50bp, we map 25 million reads.
When we map using 35b trimmed reads, we map DOUBLE that... 50 million reads. 100% more reads mapped, and errors and NA's (i.e., -1's) are avoided.
The implications are obvious; this halves the average expected (i.e., target) coverage needed for the same, or better, level of accuracy, thus also cutting in half the laboratory budget.
We are writing it up for a peer-reviewed publication using a variety of mapping tools.
I just wanted to share this tidbit, especially for those support cancer research. Please let me know if you'd like to see the data. I don't want to post images here that will also appear in a publication.
If anyone else tries this, note that trimming all reads to 25 results in the highest number of reads mapped; however, it also increases at 25 the number of valid adjacent errors. The lowest number of valid adjacent errors occurs between 30 and 35b.
Your feedback is welcome.
best,
jlw
pitt bac
The results will floor you.
When we map to (human) reference using all 50bp, we map 25 million reads.
When we map using 35b trimmed reads, we map DOUBLE that... 50 million reads. 100% more reads mapped, and errors and NA's (i.e., -1's) are avoided.
The implications are obvious; this halves the average expected (i.e., target) coverage needed for the same, or better, level of accuracy, thus also cutting in half the laboratory budget.
We are writing it up for a peer-reviewed publication using a variety of mapping tools.
I just wanted to share this tidbit, especially for those support cancer research. Please let me know if you'd like to see the data. I don't want to post images here that will also appear in a publication.
If anyone else tries this, note that trimming all reads to 25 results in the highest number of reads mapped; however, it also increases at 25 the number of valid adjacent errors. The lowest number of valid adjacent errors occurs between 30 and 35b.
Your feedback is welcome.
best,
jlw
pitt bac
Comment