I've got an interesting problem and wondered if anyone else had any thoughts about how I can approach this.
I've got some Illumina data from a run which should have contained human sequence but appears to have been contaminated with some other sequence of unknown origin. We're pretty sure the samples haven't been mixed up since some of the affected lanes were barcoded and the barcodes are present. The problem now is to try to identify the source of the contamination.
The sequences we produced are very diverse, with little or no duplication of reads, so this isn't just primers or plasmid DNA.
So far I've ruled out:
Human
Mouse
Rat
Any other vertebrate species the lab concerned work on
E.coli
..and now I'm stuck!
If you had 30million+ reads of unknown origin (or origins) how would you try to find where they'd come from?
I've got some Illumina data from a run which should have contained human sequence but appears to have been contaminated with some other sequence of unknown origin. We're pretty sure the samples haven't been mixed up since some of the affected lanes were barcoded and the barcodes are present. The problem now is to try to identify the source of the contamination.
The sequences we produced are very diverse, with little or no duplication of reads, so this isn't just primers or plasmid DNA.
So far I've ruled out:
Human
Mouse
Rat
Any other vertebrate species the lab concerned work on
E.coli
..and now I'm stuck!
If you had 30million+ reads of unknown origin (or origins) how would you try to find where they'd come from?
Comment