Dear community,
I just obtained some Illumina sequencing data, but it is in a format that I am unfamiliar with. The files were labeled "probs.txt". Does anybody recognize this format and can suggest software to parse/convert it to fastq? from the header info it It appears to be paired end illumina data with base calls, but the I am not sure if it is followed by quality values or intensities. The numbers following the basecalls come in sets of 4, and the order of the numbers corresponds with ACGT, in that if the first of the four numbers is the highest then the base is an A, if the 2nd number of a set of 4 is the highest then the base is C, etc...
3 1 10 1097#0/1 ATCTA........CCTGGCCACC............. 37 -37 -40 -40 -40 -9 -40 9 -22 17 -25 -20 -0 -40 -34 0 -2 -2 -13 -6 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -13 6 -14 -10 -24 12 -21 -13 -39 -21 -19 17 -40 -4 1 -7 -15 -11 2 -3 -13 8 -16 -12 -11 8 -23 -12 12 -14 -18 -20 -13 6 -12 -10 -15 10 -16 -13 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5
3 1 10 1097#0/2 CAGACATCGCGATCGGGTTCGCGATCCGC.CCGAAG -18 16 -33 -21 39 -39 -40 -40 -39 -8 6 -12 34 -38 -37 -40 -30 25 -27 -33 31 -31 -40 -40 -28 -8 -20 8 -17 13 -19 -20 -40 -26 17 -18 -8 5 -16 -10 -40 -10 8 -13 21 -23 -40 -28 -26 -13 -21 12 -9 3 -19 -6 -40 -22 16 -17 -40 -4 -3 -3 -40 -17 16 -26 -30 -13 -16 11 -40 -18 -36 18 -4 -4 -5 -6 -40 -24 15 -16 -24 19 -24 -21 -27 -11 10 -16 21 -24 -32 -24 -24 -15 -19 13 -19 12 -19 -14 -9 6 -13 -14 -16 -8 3 -7 -18 0 -8 -2 -5 -5 -5 -5 -9 1 -4 -12 -14 3 -11 -6 -16 -11 7 -10 14 -14 -34 -22 11 -15 -14 -28 -14 -6 -1 -4
I just obtained some Illumina sequencing data, but it is in a format that I am unfamiliar with. The files were labeled "probs.txt". Does anybody recognize this format and can suggest software to parse/convert it to fastq? from the header info it It appears to be paired end illumina data with base calls, but the I am not sure if it is followed by quality values or intensities. The numbers following the basecalls come in sets of 4, and the order of the numbers corresponds with ACGT, in that if the first of the four numbers is the highest then the base is an A, if the 2nd number of a set of 4 is the highest then the base is C, etc...
3 1 10 1097#0/1 ATCTA........CCTGGCCACC............. 37 -37 -40 -40 -40 -9 -40 9 -22 17 -25 -20 -0 -40 -34 0 -2 -2 -13 -6 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -13 6 -14 -10 -24 12 -21 -13 -39 -21 -19 17 -40 -4 1 -7 -15 -11 2 -3 -13 8 -16 -12 -11 8 -23 -12 12 -14 -18 -20 -13 6 -12 -10 -15 10 -16 -13 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5
3 1 10 1097#0/2 CAGACATCGCGATCGGGTTCGCGATCCGC.CCGAAG -18 16 -33 -21 39 -39 -40 -40 -39 -8 6 -12 34 -38 -37 -40 -30 25 -27 -33 31 -31 -40 -40 -28 -8 -20 8 -17 13 -19 -20 -40 -26 17 -18 -8 5 -16 -10 -40 -10 8 -13 21 -23 -40 -28 -26 -13 -21 12 -9 3 -19 -6 -40 -22 16 -17 -40 -4 -3 -3 -40 -17 16 -26 -30 -13 -16 11 -40 -18 -36 18 -4 -4 -5 -6 -40 -24 15 -16 -24 19 -24 -21 -27 -11 10 -16 21 -24 -32 -24 -24 -15 -19 13 -19 12 -19 -14 -9 6 -13 -14 -16 -8 3 -7 -18 0 -8 -2 -5 -5 -5 -5 -9 1 -4 -12 -14 3 -11 -6 -16 -11 7 -10 14 -14 -34 -22 11 -15 -14 -28 -14 -6 -1 -4
Comment