Dear fellow Illumina users,
Our wetlab people occassionally cross contaminate multiplexed RNA-seq samples in the same lane, most likely at the adapter ligation stage. So for example, in the same lane we have 8 samples: 4 are liver and 4 are lung. Some highly expressed liver reads are found in the lung samples. We sometimes can catch this during project specific analysis of the samples via gene expression profiles using PCA (eg the outliers), but this is too far downstream in the process from our perspective. We would like to catch this sort of thing much sooner.
We typically sequence RNA-seq projects for three different model organisms (mouse,rat,human) and a variety of tissues (liver,lung,brain,etc).
What would be a strategy for detection? Ideally, I would like to flag the suspect samples at the earliest stage possible. Perhaps taking the top five highest expressing genes per sample in each lane and checking for reads of these genes in the other samples in the same lane? or even across the entire flowcell? This gets tricky though if we have a mix of species on the flowcell.
Perhaps make a three species transcriptome and detect highly expressed contaminant reads with a fast tool like RNA-skim/Sailfish ?
Just trying to brainstorm here a fast and simple way...
Thanks for any input/ideas
Our wetlab people occassionally cross contaminate multiplexed RNA-seq samples in the same lane, most likely at the adapter ligation stage. So for example, in the same lane we have 8 samples: 4 are liver and 4 are lung. Some highly expressed liver reads are found in the lung samples. We sometimes can catch this during project specific analysis of the samples via gene expression profiles using PCA (eg the outliers), but this is too far downstream in the process from our perspective. We would like to catch this sort of thing much sooner.
We typically sequence RNA-seq projects for three different model organisms (mouse,rat,human) and a variety of tissues (liver,lung,brain,etc).
What would be a strategy for detection? Ideally, I would like to flag the suspect samples at the earliest stage possible. Perhaps taking the top five highest expressing genes per sample in each lane and checking for reads of these genes in the other samples in the same lane? or even across the entire flowcell? This gets tricky though if we have a mix of species on the flowcell.
Perhaps make a three species transcriptome and detect highly expressed contaminant reads with a fast tool like RNA-skim/Sailfish ?
Just trying to brainstorm here a fast and simple way...
Thanks for any input/ideas
Comment