Hi All,
I'm trying to convert bcl files to fastq and preserve index sequences in the read identifier line.
I followed this guide http://seqanswers.com/forums/showthread.php?t=39153 and tips from GenoMax
got me to where I needed to be, however I am also curious like dsobral as to why 8bp dual indexing (16 bp of I7-I5) ends up
as a 14 bp barcode in the output (see red tag)
Please understand that we do not produce our own sequencing data, and the files I am working with have be obtained with
minimal information about the processes involved. I understand that in both cases sequencing centers uploaded data to basespace
directly following 'typical miseq runs'.
(vex)[ir210@beast Sample_lane1]$ ls
lane1_Undetermined_L001_R1_001.fastq.gz lane1_Undetermined_L001_R2_001.fastq.gz SampleSheet.csv
(vex)[ir210@beast Sample_lane1]$ zcat lane1_Undetermined_L001_R1_001.fastq.gz |head -n4
@MISEQ:30:000000000-AB55B:1:1101:15923:1332 1:N:0:GACCGATGATGCTG
AGGTCTCAGTGGCATGATCATACTTCATTATAGCCTCCAACTCCCTGGGTCAAGCAATCCTTCCACCTCAGCCTTCTAAGTAGCTGGGACTACAGGCGTGCACTACCAGACACTACCTGTCTCTTATACACATCTCCGAGCCCACGAGACG
+
BCCBCFFFFFFFGGGGGGGGGGHHHHGHHHHHHGHHHHHHHHHHHHHHGHHHHHHHHIIHHHHHHHHHHHHHHHHHHHHHHHGHGHGGHHHHHHHHGGGGGGHHHHHFHHGGHGHHHHHHHHHHHFHHHHHHHHHHHGGGGGGHGGFGGGD
What I don’t understand is how does the MiSeq produce 'lost reads' that have correctly formatted 8bp indexes in the identifier line.
Here is a MiSeq automatically generated lost read. How did it determine the identity of the 8th position base if there are phasing issues?
(vex)[ir210@beast SLX-7061.000000000-AA0WP]$ zcat SLX-7061.000000000-AA0WP.s_1.r_1.lostreads.fq.gz|head -n4
@M01686:136:1:1101:15921:1413#CGACTCCT#TGTGTAGA
TCTCAGTTCCTCTATTTTTGTTCTATCCTGCCCTATTTCTAAGTCAGATCCTACATACAAATCATCCACCTATTGATTGCTCCCTACTGTCTCTTATACACATCTCCCTCCCCACGAGACGCCCTCCTCTCTCTTCTCCCGTCTTCTTCTTCTCCACCACACTCTCTTCCCTTCCCTCTTCTTCCTTCCTCCTCTTCCCCCCCCCCCCCCCTTCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCTCCCCCCTCCCCCTCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
+
@BBCCGFGDE@CF,CCCCE+C;;C,;,;<,C6CE,;6<C#####################################################################################################################################################################################################################################################################
Any information regarding the automated Illumina Miseq demuxing path compared with manual bcl2fastq processing would be
gratefully received, especially if you can fill me on how the 8th position base in the Index is actually used.
Thanks!
I'm trying to convert bcl files to fastq and preserve index sequences in the read identifier line.
I followed this guide http://seqanswers.com/forums/showthread.php?t=39153 and tips from GenoMax
got me to where I needed to be, however I am also curious like dsobral as to why 8bp dual indexing (16 bp of I7-I5) ends up
as a 14 bp barcode in the output (see red tag)
Please understand that we do not produce our own sequencing data, and the files I am working with have be obtained with
minimal information about the processes involved. I understand that in both cases sequencing centers uploaded data to basespace
directly following 'typical miseq runs'.
(vex)[ir210@beast Sample_lane1]$ ls
lane1_Undetermined_L001_R1_001.fastq.gz lane1_Undetermined_L001_R2_001.fastq.gz SampleSheet.csv
(vex)[ir210@beast Sample_lane1]$ zcat lane1_Undetermined_L001_R1_001.fastq.gz |head -n4
@MISEQ:30:000000000-AB55B:1:1101:15923:1332 1:N:0:GACCGATGATGCTG
AGGTCTCAGTGGCATGATCATACTTCATTATAGCCTCCAACTCCCTGGGTCAAGCAATCCTTCCACCTCAGCCTTCTAAGTAGCTGGGACTACAGGCGTGCACTACCAGACACTACCTGTCTCTTATACACATCTCCGAGCCCACGAGACG
+
BCCBCFFFFFFFGGGGGGGGGGHHHHGHHHHHHGHHHHHHHHHHHHHHGHHHHHHHHIIHHHHHHHHHHHHHHHHHHHHHHHGHGHGGHHHHHHHHGGGGGGHHHHHFHHGGHGHHHHHHHHHHHFHHHHHHHHHHHGGGGGGHGGFGGGD
What I don’t understand is how does the MiSeq produce 'lost reads' that have correctly formatted 8bp indexes in the identifier line.
Here is a MiSeq automatically generated lost read. How did it determine the identity of the 8th position base if there are phasing issues?
(vex)[ir210@beast SLX-7061.000000000-AA0WP]$ zcat SLX-7061.000000000-AA0WP.s_1.r_1.lostreads.fq.gz|head -n4
@M01686:136:1:1101:15921:1413#CGACTCCT#TGTGTAGA
TCTCAGTTCCTCTATTTTTGTTCTATCCTGCCCTATTTCTAAGTCAGATCCTACATACAAATCATCCACCTATTGATTGCTCCCTACTGTCTCTTATACACATCTCCCTCCCCACGAGACGCCCTCCTCTCTCTTCTCCCGTCTTCTTCTTCTCCACCACACTCTCTTCCCTTCCCTCTTCTTCCTTCCTCCTCTTCCCCCCCCCCCCCCCTTCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCTCCCCCCTCCCCCTCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
+
@BBCCGFGDE@CF,CCCCE+C;;C,;,;<,C6CE,;6<C#####################################################################################################################################################################################################################################################################
Any information regarding the automated Illumina Miseq demuxing path compared with manual bcl2fastq processing would be
gratefully received, especially if you can fill me on how the 8th position base in the Index is actually used.
Thanks!
Comment