Hi,
I received the results from a run from our sequence provider. We sequenced 11 bacterial samples (whole-genome extract from a ~55% GC bacteria) on a HiSeq2000, with short-paired ends, 100nt each.
Since the DNA that we provided is whole-genome extracts, double-stranded, looking at the nucleotide called per cycle (x-axis), we should see four straight, horizontal lines, with A exactly over T and G exactly over C, but that is not what we see (see attached pdf, where there is one page for each end and one plot for each sample):
Any thoughts about that? I already contacted our service provider, but I wanted to have the opinion of the community about these points... Thanks for your help!
I received the results from a run from our sequence provider. We sequenced 11 bacterial samples (whole-genome extract from a ~55% GC bacteria) on a HiSeq2000, with short-paired ends, 100nt each.
Since the DNA that we provided is whole-genome extracts, double-stranded, looking at the nucleotide called per cycle (x-axis), we should see four straight, horizontal lines, with A exactly over T and G exactly over C, but that is not what we see (see attached pdf, where there is one page for each end and one plot for each sample):
- There is some apparently random variation over the first 7-10 bases, that can eventually be explained by some remaining tags.
- Over the first 20-40 first bases, the G+C increases, and the A+T decreases, by several percents. As far as I can tell, this can't be explained by trailing adapters.
- Once the G+C is stabilized, there is still (in some samples, not in all), a significant difference between the numbers of As and Ts on one side, and of Gs, and Cs on the other. The effect is stronger between T and A. For example in sample 6, read 2, it reaches 2.5% difference, averaged between bases 41-60. There, there has to be some technical issue either with the chemistry or the base calling, because since our DNA is double-stranded to start with, it necessarily has G=C and T=A.
- There is also a significant periodic variation with a 3-cycle period. The differences are strong, reaching up to 1.3 % between points 1 and 2 of the period, in the C count. I have already seen such a periodic variation for Illumina runs, but this is much stronger than anything seen previously. These are also necessarily technical errors, and cannot come from the DNA.
Any thoughts about that? I already contacted our service provider, but I wanted to have the opinion of the community about these points... Thanks for your help!
Comment