I have received new NextSeq reads from our core facility in a semi-demultiplexed state. The P5 and P7 indices are placed in the sequence ID in an unsorted state. Here is an example of four reads in one of the pair's fastq files:
I would like to split the reads into separate fastq files based on the indices, but I cannot find any suitable tools to do it. It needs to be reasonably fast as well, as this sequencing run has 400 million reads ....
All help is very appreciated.
Code:
@NS500551:36:H5VJNBGXX:1:11101:17033:1044 2:N:0:GGACTCCT+GCGATCTA GGGAGGTCTATATAAGCAGAGCTGGTACCA............ + AAAAA.FF<)<.<FFFFFFA<.FFFFFF.F.FFFFA.......... @NS500551:36:H5VJNBGXX:1:11101:2211:1044 2:N:0:TAAGGCGA+TCTACTCT GGGAGGTCTATATAAGCAGAGCTATAACCTC....... + AAA<A.FFF)7.<FFF7.AFFAA)F<AA)FFFFAA....... @NS500551:36:H5VJNBGXX:1:11101:24462:1044 2:N:0:TCCTGAGC+GCGATCTA GGGAGGTCTATATAAGCAGAGCTGGTACCAC........ + <AA.A.FA<.7.FFFF<)FFFFAFF<A<.<FF<FF..... @NS500551:36:H5VJNBGXX:1:11101:16844:1044 2:N:0:AGGCAGAA+TCTACTCT GGGAGGTCTATATAAGCAGAGCTATAACTTCG........ + AAA<A.F.F<A<.FFFF<F)F.FAFFF<FFAFFFFFFFFF......
All help is very appreciated.
Comment