![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
How to trim Vector and Contanmination from Illumian reads? | wangchy | Bioinformatics | 9 | 03-06-2013 12:50 AM |
Newbler vector trimming issue | andylai | 454 Pyrosequencing | 7 | 01-10-2012 11:16 AM |
PubMed: A De Novo Expression Profiling of Anopheles funestus, Malaria Vector in Afric | Newsbot! | Literature Watch | 0 | 03-03-2011 03:00 AM |
Vector Removal Software | tdoniger | Bioinformatics | 3 | 02-08-2011 11:29 PM |
Vector trimming: are flanking sequences sufficient? | sulicon | Bioinformatics | 1 | 09-20-2010 08:02 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Menlo Park Join Date: Dec 2010
Posts: 68
|
![]()
After preliminary de novo clustering/assembly of transcriptomic data from a non-model organism, I've found what appears to be a pretty good indication of vector contamination (bitscore:6318; evalue:0.0; 3423 out of 3424 identities) (Accession# AY817672)
This is how the library was prepared: I extracted high quality Total RNA from the organism, and shipped it to the sequencing facility who generated the library and ran 1 Lane of a flow cell (2x76bp) that generated ~5.1Gb of total data (~34,000,000 paired end reads) Now the overall frequency appears to be low, only ~600,000bases (or <0.01%). And it actually winds working almost as an assembly "quality control metric" to allow us to assess consistency between different assemblers. But as far as i'm concerned, this vector shouldn't be in our library, and it seems like it's something that our sequencing service provider should be able to account for. Has anybody else found this sort of thing in their Solexa/Illumina libraries? As far as I'm aware, cloning vectors are not a part of the Illumina protocol so it's unlikely to simply be an artifact from library prep. Am I wrong? Thanks for the insight. Last edited by gconcepcion; 02-05-2011 at 11:14 AM. |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: San Diego Join Date: May 2008
Posts: 912
|
![]()
I do bacteria, and I've found stuff like that too.
The simplest answer is that it was in your sample. That's what the sequencing facility will tell you. You'd have to make cDNA and sanger to be sure, but in general, you should believe your data. Your data tells you you've got vector, you should believe that until you have empirical data (like a failed PCR reaction) that conflicts. |
![]() |
![]() |
![]() |
#3 | |
Member
Location: Menlo Park Join Date: Dec 2010
Posts: 68
|
![]() Quote:
But what do I know!? i could be wrong! |
|
![]() |
![]() |
![]() |
#4 |
Simon Andrews
Location: Babraham Inst, Cambridge, UK Join Date: May 2009
Posts: 871
|
![]()
We've seen contamination coming from some odd places. In one case we had heavy contamination with bacterial DNA in what should have been a eukaryotic sample. It turned out the contamination was in a preparation of streptavidin beads used for a ChIP.
Since you now have the sequence for your vector you could always run a PCR on your original material which should tell you if it was present before you sent your sample off for sequencing. |
![]() |
![]() |
![]() |
#5 | |
Member
Location: Menlo Park Join Date: Dec 2010
Posts: 68
|
![]() Quote:
Cheers! |
|
![]() |
![]() |
![]() |
#6 |
Senior Member
Location: Purdue University, West Lafayette, Indiana Join Date: Aug 2008
Posts: 2,317
|
![]()
Where in the vector does your sequence match? I ask because bases 1-10270 of the genbank record you provide are the sequence of an SIV provirus. Is your non-model species a mammal? It may just contain some viral RNA.
Also, how deep was your 454 run? At the frequency you mention above, a typical 454 run (400 million bases) would give you about 7000 bases of sequence -- only enough to go about 2x on the contig of the size you found in your Illumina data. So unless you took your suspect Illumina contig and blasted it against your full 454 data set (pre-assembly), then you might be just missing sequence that is there. That said, I would have to say your suspicions are reasonable. Here is the problem though: how good do you expect the contamination control of any facility to be? If the only contamination present in your sequence is that of the contig you describe, that would put you at less than 20 parts per million. Given that all second generation sequencers have PCR as part of their work flow what does it take to prevent residual amplicon levels to get that high? Will using plug seal pippette tips and keeping post and pre-PCR areas separate be sufficient? Or do we need clean room level measures? -- Phillip |
![]() |
![]() |
![]() |
Thread Tools | |
|
|