kmcarr 07-21-2009 07:07 AM

Read quality filtering for long, PE runs
It is obvious that the default read filter passing parameters set in the Illumina pipeline or RTA are pretty meaningless for long, paired end runs. By this I mean that filter passing is based solely on the first 25 cycles of the run; on a 2X76 PE run a lot can happen in cycles 26-152 to make a read worthless. As an example, a software failure at cycle 25 forced us to restart a run, including realigning the xy of the stage. This resulted in a small fraction of clusters being too far out of alignment to be further called. I have actual data where bases 26-76 of read1 and all of read2 are 'N's but the filtering algorithm still calls them as passed reads.

I have been thinking about better ways to filter reads but I would like to hear from the community. Has anyone else here applied filtering of their own or do most people ignore filtering entirely and just throw the whole pile at the mapper/assembler and let it figure it out?

