I have one technical question regarding to gene expression profiling and I like to consult with the expert here, to see how you manage this as Director in a sequencing center or core facility.
As you can understand, from the point of a sequencing center or core facility, we are dealing with different set of samples from different PIs. I can understand PIs want to have their sequencing data as a whole set ready within single run. For instance, one RNA-Seq project has 10 RNA samples. PI wants to use 2 lanes of 50SR for just regular gene expression study. So the pooling scheme will be 5 samples per lane and we target to generate about 30M reads per sample, if we say one lane produces about 150M reads. As you know, this is very common to have sample reads variation in the pooling. For this 10 RNA sample project, we generate 60M reads for sample 1 and only 15M reads for sample 2. At my position, I will say I can generate another 15M reads for sample 2 in our next run either using the same library if we still have it available or make new library to make up 15M reads. So PI will have 30 M reads combined together for sample 2 for data analysis. I heard this kind of incremental strategy may be NOT appropriate for RNA expression profiling. I'm not sure how you manage this at your place. For RNA-Seq or small RNA-Seq these kind of gene expression profiling experiment, how do you deal with this case, adding 15M reads for sample 2 or you just re-run sample 2 with the target of 30M reads in the next run? Or you do some analysis (for instance R2 correlation calculation) to make sure you can combine reads from different runs?
This is very important issue for me to think about. If this is true, it could significantly change our experimental set up-I will have to ask PIs to sequence more in the first time to try to avoid any make up later. Also based on your experience and knowledge, as well as some observations from different sequencing centers and facilities, what are your comments/thoughts on this?
I will appreciate your thoughts on this very much.
Thanks
ChristmasSunflower
As you can understand, from the point of a sequencing center or core facility, we are dealing with different set of samples from different PIs. I can understand PIs want to have their sequencing data as a whole set ready within single run. For instance, one RNA-Seq project has 10 RNA samples. PI wants to use 2 lanes of 50SR for just regular gene expression study. So the pooling scheme will be 5 samples per lane and we target to generate about 30M reads per sample, if we say one lane produces about 150M reads. As you know, this is very common to have sample reads variation in the pooling. For this 10 RNA sample project, we generate 60M reads for sample 1 and only 15M reads for sample 2. At my position, I will say I can generate another 15M reads for sample 2 in our next run either using the same library if we still have it available or make new library to make up 15M reads. So PI will have 30 M reads combined together for sample 2 for data analysis. I heard this kind of incremental strategy may be NOT appropriate for RNA expression profiling. I'm not sure how you manage this at your place. For RNA-Seq or small RNA-Seq these kind of gene expression profiling experiment, how do you deal with this case, adding 15M reads for sample 2 or you just re-run sample 2 with the target of 30M reads in the next run? Or you do some analysis (for instance R2 correlation calculation) to make sure you can combine reads from different runs?
This is very important issue for me to think about. If this is true, it could significantly change our experimental set up-I will have to ask PIs to sequence more in the first time to try to avoid any make up later. Also based on your experience and knowledge, as well as some observations from different sequencing centers and facilities, what are your comments/thoughts on this?
I will appreciate your thoughts on this very much.
Thanks
ChristmasSunflower
Comment