View Single Post
Old 03-25-2016, 04:21 PM   #6
Junior Member
Location: Dublin

Join Date: Mar 2009
Posts: 6

Thanks for the insights & advice - all very helpful.

We're doing a pilot study comparing sequence reads from a multiplexed PCR of 200 SNP assays with genotypes for the same SNPs generated using an alternative platform, so we're interested in exploring all aspects of the data for now, but I shall bear in mind that the reads with mismatching index sequences may give false results.

We've 218M reads from two lanes of a Hi Seq flowcell and 2,112 combinations of I5 and I7 adapters (22 I7 x 96 I5) to demultiplex. I'm not quite sure how long the demultiplexing took using your demuxbyname utility, though it ran in a short enough time to be practical, and is very convenient to use from a scripting point of view. One thing I did note is that demuxbyname processes continued running after all of the reads had been demultiplexed, so I had to interrupt my pipeline and kill the processes manually before I could proceed to genotype calling.

Code used: in=L001_R1_001.fastq.gz out=L1.%.fq.gz outu=L1.bad_index.fq.gz suffix names=bbm_indices ow=true

... where bbm_indices contains the list of 2112 I5 x I7 combinations
spark is offline   Reply With Quote