SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to trim Vector and Contanmination from Illumian reads? wangchy Bioinformatics 9 03-06-2013 12:50 AM
Newbler vector trimming issue andylai 454 Pyrosequencing 7 01-10-2012 11:16 AM
PubMed: A De Novo Expression Profiling of Anopheles funestus, Malaria Vector in Afric Newsbot! Literature Watch 0 03-03-2011 03:00 AM
Vector Removal Software tdoniger Bioinformatics 3 02-08-2011 11:29 PM
Vector trimming: are flanking sequences sufficient? sulicon Bioinformatics 1 09-20-2010 08:02 AM

Reply
 
Thread Tools
Old 02-05-2011, 11:12 AM   #1
gconcepcion
Member
 
Location: Menlo Park

Join Date: Dec 2010
Posts: 67
Default Vector contamination?

After preliminary de novo clustering/assembly of transcriptomic data from a non-model organism, I've found what appears to be a pretty good indication of vector contamination (bitscore:6318; evalue:0.0; 3423 out of 3424 identities) (Accession# AY817672)

This is how the library was prepared: I extracted high quality Total RNA from the organism, and shipped it to the sequencing facility who generated the library and ran 1 Lane of a flow cell (2x76bp) that generated ~5.1Gb of total data (~34,000,000 paired end reads)
Now the overall frequency appears to be low, only ~600,000bases (or <0.01%). And it actually winds working almost as an assembly "quality control metric" to allow us to assess consistency between different assemblers. But as far as i'm concerned, this vector shouldn't be in our library, and it seems like it's something that our sequencing service provider should be able to account for.

Has anybody else found this sort of thing in their Solexa/Illumina libraries? As far as I'm aware, cloning vectors are not a part of the Illumina protocol so it's unlikely to simply be an artifact from library prep. Am I wrong?

Thanks for the insight.

Last edited by gconcepcion; 02-05-2011 at 11:14 AM.
gconcepcion is offline   Reply With Quote
Old 02-07-2011, 01:51 PM   #2
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

I do bacteria, and I've found stuff like that too.

The simplest answer is that it was in your sample. That's what the sequencing facility will tell you. You'd have to make cDNA and sanger to be sure, but in general, you should believe your data. Your data tells you you've got vector, you should believe that until you have empirical data (like a failed PCR reaction) that conflicts.
swbarnes2 is offline   Reply With Quote
Old 02-07-2011, 04:11 PM   #3
gconcepcion
Member
 
Location: Menlo Park

Join Date: Dec 2010
Posts: 67
Default

Quote:
Originally Posted by swbarnes2 View Post
I do bacteria, and I've found stuff like that too.

The simplest answer is that it was in your sample. That's what the sequencing facility will tell you. You'd have to make cDNA and sanger to be sure, but in general, you should believe your data. Your data tells you you've got vector, you should believe that until you have empirical data (like a failed PCR reaction) that conflicts.
Thanks for the response. I didn't mention in the first post that the Total RNA that I sent to the facility was used to prepare two EST libraries, one for 454 pyrosequencing and one for Illumina Solexa sequencing. The 454 data was assembled and no evidence of any vector was found whatsoever. This 'evidence' (or lack thereof) leads me to believe that the vector was not in the original sample. Coupled with the fact that we work with eukaryotic protists and have never had that vector in our lab makes me doubt that the sample is the source.

But what do I know!? i could be wrong!
gconcepcion is offline   Reply With Quote
Old 02-08-2011, 12:31 AM   #4
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 870
Default

We've seen contamination coming from some odd places. In one case we had heavy contamination with bacterial DNA in what should have been a eukaryotic sample. It turned out the contamination was in a preparation of streptavidin beads used for a ChIP.

Since you now have the sequence for your vector you could always run a PCR on your original material which should tell you if it was present before you sent your sample off for sequencing.
simonandrews is offline   Reply With Quote
Old 02-08-2011, 05:19 AM   #5
gconcepcion
Member
 
Location: Menlo Park

Join Date: Dec 2010
Posts: 67
Default

Quote:
Originally Posted by simonandrews View Post
We've seen contamination coming from some odd places. In one case we had heavy contamination with bacterial DNA in what should have been a eukaryotic sample. It turned out the contamination was in a preparation of streptavidin beads used for a ChIP.
Interesting, It didn't occur to me that there may be contamination from supposedly "clean" reagents/disposables used during extraction. At any rate, primers have been ordered and i'll be checking for contamination in the actual sample.

Cheers!
gconcepcion is offline   Reply With Quote
Old 02-08-2011, 06:14 AM   #6
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,297
Default

Where in the vector does your sequence match? I ask because bases 1-10270 of the genbank record you provide are the sequence of an SIV provirus. Is your non-model species a mammal? It may just contain some viral RNA.

Also, how deep was your 454 run? At the frequency you mention above, a typical 454 run (400 million bases) would give you about 7000 bases of sequence -- only enough to go about 2x on the contig of the size you found in your Illumina data. So unless you took your suspect Illumina contig and blasted it against your full 454 data set (pre-assembly), then you might be just missing sequence that is there.

That said, I would have to say your suspicions are reasonable. Here is the problem though: how good do you expect the contamination control of any facility to be? If the only contamination present in your sequence is that of the contig you describe, that would put you at less than 20 parts per million. Given that all second generation sequencers have PCR as part of their work flow what does it take to prevent residual amplicon levels to get that high? Will using plug seal pippette tips and keeping post and pre-PCR areas separate be sufficient? Or do we need clean room level measures?

--
Phillip
pmiguel is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:34 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO