Rocketknight 04-13-2012 07:36 AM

Unexpected FastQC results
Hi all, I was wondering if anyone could suggest an explanation for the somewhat weird FastQC results we got on our last run (human exome samples, Nimblegen v2 enrichment kit, 10 samples per lane at 2x100bp on a HiSeq 2000 with Version 3 chemistry). These are paired-end of reads from the same lane - all samples in the same lane showed this same pattern, though all of those samples were prepped at the same time too. The first reads seem completely fine but the second ones tank in quality around position 15-19. We did all sample prep in-house but we outsourced the sequencing (since we don't have a HiSeq on site).

The reads also showed some moderate sequence duplication (~21%) and an AT-bias, both of which I presume are caused by over-amplification during sample prep, so I'm going to tone down that in future. I still feel like that doesn't explain the weird quality drop here though. Can anyone suggest what it might be caused by?

kmcarr 04-13-2012 09:00 AM


In my experience results like this indicate a problem with the sequencing run, not the library(ies), and typically something pretty mundane like a (partial) blockage of the reagent flow to the lane in question.

simonandrews 04-14-2012 01:11 AM

I suspect that if you look at the per-sequence quality plots you'll see that this will be a subset of sequences with continually poor quality, and that a significant proportion of your library will be OK. If you look at the plots then the medians are OK on the second read, so it's a subset which is dragging down the lower quartile.

If you quality filter your data you'll probably have enough data left to continue your analysis.

Rocketknight 04-14-2012 03:37 AM

That's exactly what I see, yeah. Thanks to both of you for the explanation. Should I just remove those low-quality reads and align their pairs as singletons, or should I go ahead aligning them anyway in the hopes of extracting some information from them?

Also, I noticed the per-base N content looks pretty interesting. Would I be right in saying that there was a blockage or other mishap for a few cycles (around 15-19) and that as a result the affected clusters ended up out of sync and so no longer generated a clear signal for the rest of the run?

