View Single Post
Old 12-11-2014, 09:51 AM   #18
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

The alignments were done by an indel-capable aligner. That's not the problem. In fact, the actual quality scores are calculated separately for bases impacted by mismatches only and for bases impacted by indels or SNPs. Furthermore, the exact same analysis was done for HiSeq, MiSeq, and NextSeq, and NextSeq is the only one with the major quality issues.

Here, let me show you. These graphs were all generated by mapping after adapter-trimming the input reads. This is from a HiSeq2500, which shows low error rates and accurate (generally conservative) quality scores:



And this is from a NextSeq, which shows extremely high error rates and vastly inflated quality scores:



You can plainly see that something is very wrong without any mapping whatsoever, just by looking at the base frequency histogram:


Possibly, the high error rate is driven by the A/T ratio divergence, and thus due to a fundamental base-calling or dye-system issue, but I don't know. At any rate, the base frequency divergence, the inflated Q-scores, and the high error rates have now been seen on 3 different independent NextSeq platforms at 3 different facilities (ours, Illumina's, and one of our collaborators') with unrelated organisms and libraries. I have yet to see a NextSeq run from anywhere that did not exhibit these characteristics, but now that I have 3 independent confirmations, I don't really expect that I will see one.

The way I produced these graphs (starting with interleaved reads, and using BBTools):

bbduk.sh in=reads.fastq.gz out=trimmed.fq.gz ktrim=r k=23 hdist=1 mink=11 tpe tbo minlen=90 ref=truseq.fa.gz,nextera.fa.gz

bbmap.sh maxindel=200 in=trimmed.fq.gz mhist=mhist.txt bhist=bhist.txt qhist=qhist.txt qahist=qahist.txt

I encourage anyone who is unable to share their raw data to do the same, and share the histograms. Ideally, for the same library sequenced on both a NextSeq and HiSeq/MiSeq, to eliminate any possible variables.
Attached Images
File Type: png hs2500_mhist.png (50.1 KB, 1324 views)
File Type: png hs2500_trueq.png (25.9 KB, 1192 views)
File Type: png ns_mhist.png (40.1 KB, 1206 views)
File Type: png ns_trueq.png (16.3 KB, 1192 views)
File Type: png ns_bhist.png (28.9 KB, 1184 views)
Brian Bushnell is offline   Reply With Quote