Hi All,
We're running standalone with our new NextSeq500 and the very first user came in with 30 multiplexed NEB Next Ultra2 libraries, with BIOO single index 12bp long adapters, for us to run on a MID-flowcell as genomic DNA (2x150) using an Illumina kit.
We spent some time checking with NEB, and then BIOO, and Illumina, before proceeding. We set up a plate with the 12bp indices on the I7 sequence primer. In short, although slightly overclustered (233M/mm2) the run looked okay (89% >Q30), and all samples demultiplexed. A few things not specifically discussed on these forums have appeared in fastqc and the flowcell summary.
Newbie Question: Roughly 10% of my reads went to "Undetermined". This is not far off from our 10% PhiX spike in. Of those that were PhiX, at least 90% had "GGGGGGGGGGGG" as the barcode. Is it normal for bcl2Fastq to keep these dark indices? It also kept over a million reads with a barcode of "NNNNNNNNNNNN". Normal?
PROBLEMS:
For the "good" demultiplexed sequences of ONE sample (~4.7M reads):
1. In read-pair file reads #1, there were ~42K consisting of a stretch of Ns, exactly 35 bp long. Also >37K were PhiX reads ending in a long run of Gs. They showed the correct index.
2. Same sample, in read-pair file #2, about 243 sequences were reads dominated by mostly Ns, with some a string consisting of only 35 Ns. There were also about 1500 PhiX reads ending in long runs of Gs. They also showed the correct index.
How would this happen? It doesn't seem like a tile edge effect, and while I know that PhiX makes it into the "good" reads sometimes, and I know that "G"s can be heavily over represented at the ends of read sequences, why so many, and why so different depending on the read pairs?
Thanks.
-p
We're running standalone with our new NextSeq500 and the very first user came in with 30 multiplexed NEB Next Ultra2 libraries, with BIOO single index 12bp long adapters, for us to run on a MID-flowcell as genomic DNA (2x150) using an Illumina kit.
We spent some time checking with NEB, and then BIOO, and Illumina, before proceeding. We set up a plate with the 12bp indices on the I7 sequence primer. In short, although slightly overclustered (233M/mm2) the run looked okay (89% >Q30), and all samples demultiplexed. A few things not specifically discussed on these forums have appeared in fastqc and the flowcell summary.
Newbie Question: Roughly 10% of my reads went to "Undetermined". This is not far off from our 10% PhiX spike in. Of those that were PhiX, at least 90% had "GGGGGGGGGGGG" as the barcode. Is it normal for bcl2Fastq to keep these dark indices? It also kept over a million reads with a barcode of "NNNNNNNNNNNN". Normal?
PROBLEMS:
For the "good" demultiplexed sequences of ONE sample (~4.7M reads):
1. In read-pair file reads #1, there were ~42K consisting of a stretch of Ns, exactly 35 bp long. Also >37K were PhiX reads ending in a long run of Gs. They showed the correct index.
2. Same sample, in read-pair file #2, about 243 sequences were reads dominated by mostly Ns, with some a string consisting of only 35 Ns. There were also about 1500 PhiX reads ending in long runs of Gs. They also showed the correct index.
How would this happen? It doesn't seem like a tile edge effect, and while I know that PhiX makes it into the "good" reads sometimes, and I know that "G"s can be heavily over represented at the ends of read sequences, why so many, and why so different depending on the read pairs?
Thanks.
-p
Comment