Hello all,
I know that a similar thread was initiated by Simcom with a lot of replies, but I have a slightly different problem. We are sequecing polyA pulled down RNA based libraries to determine alternative poly adenylation. The way the libraries were constructed, the Read 1 on the HiSeq was to start with 6 Ns ( which would serve as unique molecular identifiers or UMIs) followed by 14 Ts and then the transcriptomic sequence. 5 indexed libraries were pooled, and after consultation with several sources including Illumina tech support, it was decided that we will "generate matrix and normalize" to a different lane with a good diversity, and not to spike in PhiX or other controls at a high concentration ( since it will just defeat the purpose of trying to get higher reads in the HiSeq instead of the GAIIx and apparently posed no advantage ) and to load at 8 pM. The library has a mean size of 250 bp. So the initial results from this lane were back and seemed like there were ~ 800K/mm2 clusters and 220 million reads, but with only 30 million that passed the "Chastity filter" calculated from the first 20 bps of or so. The graphics indicate that after 20 bp, mixed signals from all bases are appearing ( after what is clearly a T run). Everyone seems to agree that the low PF rate is likely from the T run, but not sure what to do next with the data - clearly it would be ideal to use more than just the 30 million that passed filter. I was wondering about the following:
1. Any general ideas about what to do about this data-set post-run ? I have seen other discussions about analysing data from low diversity initial bps effectively with non-inhouse algorithms.
2. If this is related to the T run, what is the likely reason for poor quality even though it was normalized to a different ( and succesful lane) ?
3. Are the image files from HiSeq stored as TIFF that could be analyzed with next-phred or some other alternate base caller ?
4. Can "deferred cluster calling" described by Felix Krueger's PLos One paper and what he described in the forum early last year, or something protocol that is similar probably helpful to this scenario ?
Appreciate all help and apologize for naive statements/ assumptions above.
I know that a similar thread was initiated by Simcom with a lot of replies, but I have a slightly different problem. We are sequecing polyA pulled down RNA based libraries to determine alternative poly adenylation. The way the libraries were constructed, the Read 1 on the HiSeq was to start with 6 Ns ( which would serve as unique molecular identifiers or UMIs) followed by 14 Ts and then the transcriptomic sequence. 5 indexed libraries were pooled, and after consultation with several sources including Illumina tech support, it was decided that we will "generate matrix and normalize" to a different lane with a good diversity, and not to spike in PhiX or other controls at a high concentration ( since it will just defeat the purpose of trying to get higher reads in the HiSeq instead of the GAIIx and apparently posed no advantage ) and to load at 8 pM. The library has a mean size of 250 bp. So the initial results from this lane were back and seemed like there were ~ 800K/mm2 clusters and 220 million reads, but with only 30 million that passed the "Chastity filter" calculated from the first 20 bps of or so. The graphics indicate that after 20 bp, mixed signals from all bases are appearing ( after what is clearly a T run). Everyone seems to agree that the low PF rate is likely from the T run, but not sure what to do next with the data - clearly it would be ideal to use more than just the 30 million that passed filter. I was wondering about the following:
1. Any general ideas about what to do about this data-set post-run ? I have seen other discussions about analysing data from low diversity initial bps effectively with non-inhouse algorithms.
2. If this is related to the T run, what is the likely reason for poor quality even though it was normalized to a different ( and succesful lane) ?
3. Are the image files from HiSeq stored as TIFF that could be analyzed with next-phred or some other alternate base caller ?
4. Can "deferred cluster calling" described by Felix Krueger's PLos One paper and what he described in the forum early last year, or something protocol that is similar probably helpful to this scenario ?
Appreciate all help and apologize for naive statements/ assumptions above.
Comment