We have just received our first paired-end library of SOLiD reads (x3 libraries). As an example, one library contains 75 million reads. However some of these have particularly low quality scores (below 20). We have read in a previous thread here about filtering reads before mapping based on a mean / median QV of 20 and have downloaded the paper "Analysis of quality raw data of second generation sequences with Quality Assessment Software" http://www.biomedcentral.com/1756-0500/4/130 as discussed in the previous thread (sorry, don't know how to link a previous thread into this message).
When we run the Quality Assessment Software based on a mean QV20, 75 million reads decreases to 31 million reads. However with a QV30 (as a minimum QV30 is required for SNP calling we think) our 75 million reads become 0.5 million reads).
The bioinformaticians within the facility which carried out the SOLiD run for us have advised against pre-filtering reads, and allowing the mapping software (we plan to use Bowtie/TopHat/Cufflinks) to make a call on whether the read quality is good enough.
We just wondered what everyone else would do in this situation? Our instincts tell us to filter out the reads below QV20, however we are being advised to leave them in.
We are also concerned that >50% of our reads are low quality (QV20), and only 0.6% are good enough for SNP calling - is this what other people have experienced?
Your advice will be much appreciated and thank you for taking the time to read this post.
Helen
When we run the Quality Assessment Software based on a mean QV20, 75 million reads decreases to 31 million reads. However with a QV30 (as a minimum QV30 is required for SNP calling we think) our 75 million reads become 0.5 million reads).
The bioinformaticians within the facility which carried out the SOLiD run for us have advised against pre-filtering reads, and allowing the mapping software (we plan to use Bowtie/TopHat/Cufflinks) to make a call on whether the read quality is good enough.
We just wondered what everyone else would do in this situation? Our instincts tell us to filter out the reads below QV20, however we are being advised to leave them in.
We are also concerned that >50% of our reads are low quality (QV20), and only 0.6% are good enough for SNP calling - is this what other people have experienced?
Your advice will be much appreciated and thank you for taking the time to read this post.
Helen
Comment