Originally posted by edge
View Post
I'd run the sequences through FastQC. This will generate a per-base quality plot and a per-sequence quality plot. The per base plot will tell you if there is a systematic loss of quality as your run progresses. If there is you might want to consider trimming a bit off the length of all of your sequences.
You can also see if there is a sub-population of sequences which have universally poor quality within a generally good run. This is much more unusual in my experience, but if it is a problem in your data you could filter these out as well.
I have to say that in most cases we don't filter our data unless the qualities are really poor. Many downstream analyses can take the assigned quality values into account so they don't need to have it all removed, but this may be different in your case depending on what you're intending doing with the data.
Comment