Hi All,
I've been doing this a while, but have run into a new problem on a batch of RNA-Seq samples (Truseq) that I'm concerned about. We prepped a bunch of RNA samples for sequencing and have a few that are low concentration. The core facility that I have been using for years was unhappy with this, but I didn't really see where the issue was given there was plenty of material for pooling.
Nevertheless they ran one lane for me (HiSeq), but apparently had to modify their dilution protocol. The result was as follows for the lane:
Reads % lane % PF cluster Qual score
Sample*1 31,287,035 12.66 89.81 36.96
Sample*2 70,103,756 28.36 89.49 36.96
Sample*3 8,599,888 3.48 90.17 36.97
Sample*4 3,309,119 1.34 89.80 37.01
Sample*5 98,775,103 39.95 89.24 36.94
Sample*6 28,415,646 11.49 90.10 36.94
Now, clearly the issue is that the distribution of data is very uneven, BUT the number of reads directly and perfectly correlate with the starting concentration of each sample prior to pooling (R2 = 0.99)! From what I can gather, the core did not make an intermediate dilution for each but instead added variable volumes to a pool and used that to cluster, but what happened in between I do not know. What I dont understand is why they did this. Here are the sample concs:
ng/ul nM
Sample1 1.6 6.3
Sample2 2.4 9.6
Sample3 0.9 3.5
Sample4 0.6 2.4
Sample5 2.7 10.8
Sample6 1.5 6.1
We are almost certain that given the correlation with the pre-pooled sample concentration the pooling has been done incorrectly and that the variable volumes has perhaps thrown everything off balance. Does that sound reasonable?
According to all the illumina documentation I have read, the best thing to do would be to normalize each sample to 2 nM, and then combine equal volume of each together to make a 2 nM pool, which would then be the starting point for cluster generation. Does this sound reasonable? Am I missing something here?
****I should mention that samples were quantified with both picogreen and bioanalyzer HS DNA. These gave very similar results (r = 0.8) so I doubt quantification is a major issue here****
I've been doing this a while, but have run into a new problem on a batch of RNA-Seq samples (Truseq) that I'm concerned about. We prepped a bunch of RNA samples for sequencing and have a few that are low concentration. The core facility that I have been using for years was unhappy with this, but I didn't really see where the issue was given there was plenty of material for pooling.
Nevertheless they ran one lane for me (HiSeq), but apparently had to modify their dilution protocol. The result was as follows for the lane:
Reads % lane % PF cluster Qual score
Sample*1 31,287,035 12.66 89.81 36.96
Sample*2 70,103,756 28.36 89.49 36.96
Sample*3 8,599,888 3.48 90.17 36.97
Sample*4 3,309,119 1.34 89.80 37.01
Sample*5 98,775,103 39.95 89.24 36.94
Sample*6 28,415,646 11.49 90.10 36.94
Now, clearly the issue is that the distribution of data is very uneven, BUT the number of reads directly and perfectly correlate with the starting concentration of each sample prior to pooling (R2 = 0.99)! From what I can gather, the core did not make an intermediate dilution for each but instead added variable volumes to a pool and used that to cluster, but what happened in between I do not know. What I dont understand is why they did this. Here are the sample concs:
ng/ul nM
Sample1 1.6 6.3
Sample2 2.4 9.6
Sample3 0.9 3.5
Sample4 0.6 2.4
Sample5 2.7 10.8
Sample6 1.5 6.1
We are almost certain that given the correlation with the pre-pooled sample concentration the pooling has been done incorrectly and that the variable volumes has perhaps thrown everything off balance. Does that sound reasonable?
According to all the illumina documentation I have read, the best thing to do would be to normalize each sample to 2 nM, and then combine equal volume of each together to make a 2 nM pool, which would then be the starting point for cluster generation. Does this sound reasonable? Am I missing something here?
****I should mention that samples were quantified with both picogreen and bioanalyzer HS DNA. These gave very similar results (r = 0.8) so I doubt quantification is a major issue here****
Comment