Hi,
I have a project where we are reporting on differentially expressed genes between infected samples and negative controls. After sending out several successful test lanes, we submitted all of our remaining libraries to our core for RNA-seq last week. We sequenced on a Hiseq 4000. The results returned number of reads assigned to samples is askew for most lanes. As an example
Samples read #
Lane 1
sample 1 118,369,316
sample 2 127,860,068
sample 3 33,369,611
sample 4 74,631,001
sample 5 892,059
sample 6 52,788,867
and
Lane 2 (most lanes look like this)
sample 1 106,314,515
sample 2 36,354,411
sample 3 28,392,645
sample 4 74,239,452
sample 5 80,632,554
sample 6 83,282,609
In most cases, I have one or two samples per lane that have ~25 million reads, while another one or two will have ~100 million.
All indices are accounted for and there is no indication of significant adaptors or short fragments and they have the same profiles of base composition, GC content. Despite working off of high sensitivity qubit readings (the core discouraged me from doing qpcr on the individual libraries - a mistake I'm not going to repeat), the bioanalyzer fragment size, and having three people crosscheck concentration calculations as well as oversee the actual pooling, the problem appears to be bad equimolar pooling. The uneven pooling is happening between negative controls and their infected sample mate. The samples causing the most trouble are low concentration samples.
These are very hard won samples - taken from small animals, RNA extracted from few cells etc.. Gathered over the course of a full year. The deal is, for many of the samples we have no more library to sequence a second time. My budget is also depleted. If the project is to continue the only option I have is to make use of these specific sequences. I'm just not sure what to do. If there is nothing to be done, after investing a year in time and close to 100K in resources, the project dies right here.
So I'm writing to ask two things 1) are there post-sequencing options I can pursue to make this data useful? Can I, for example, randomly remove reads from over-sequenced samples so that they have a similar count to the lower-sequenced samples. 2) we proceeded to this point very carefully and still, this uneven pooling occurred. Is there something I'm missing here? Is there anything else I should be doing in advance of pooling for sequencing?
and a third thing. If anyone has a line on a core that will pool for me, please PM a recommendation to me. I'm not inclined to use our facilities again.
I have a project where we are reporting on differentially expressed genes between infected samples and negative controls. After sending out several successful test lanes, we submitted all of our remaining libraries to our core for RNA-seq last week. We sequenced on a Hiseq 4000. The results returned number of reads assigned to samples is askew for most lanes. As an example
Samples read #
Lane 1
sample 1 118,369,316
sample 2 127,860,068
sample 3 33,369,611
sample 4 74,631,001
sample 5 892,059
sample 6 52,788,867
and
Lane 2 (most lanes look like this)
sample 1 106,314,515
sample 2 36,354,411
sample 3 28,392,645
sample 4 74,239,452
sample 5 80,632,554
sample 6 83,282,609
In most cases, I have one or two samples per lane that have ~25 million reads, while another one or two will have ~100 million.
All indices are accounted for and there is no indication of significant adaptors or short fragments and they have the same profiles of base composition, GC content. Despite working off of high sensitivity qubit readings (the core discouraged me from doing qpcr on the individual libraries - a mistake I'm not going to repeat), the bioanalyzer fragment size, and having three people crosscheck concentration calculations as well as oversee the actual pooling, the problem appears to be bad equimolar pooling. The uneven pooling is happening between negative controls and their infected sample mate. The samples causing the most trouble are low concentration samples.
These are very hard won samples - taken from small animals, RNA extracted from few cells etc.. Gathered over the course of a full year. The deal is, for many of the samples we have no more library to sequence a second time. My budget is also depleted. If the project is to continue the only option I have is to make use of these specific sequences. I'm just not sure what to do. If there is nothing to be done, after investing a year in time and close to 100K in resources, the project dies right here.
So I'm writing to ask two things 1) are there post-sequencing options I can pursue to make this data useful? Can I, for example, randomly remove reads from over-sequenced samples so that they have a similar count to the lower-sequenced samples. 2) we proceeded to this point very carefully and still, this uneven pooling occurred. Is there something I'm missing here? Is there anything else I should be doing in advance of pooling for sequencing?
and a third thing. If anyone has a line on a core that will pool for me, please PM a recommendation to me. I'm not inclined to use our facilities again.
Comment