SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
Pooling multiplexed libraries after ligation, before PCR ScottC Illumina/Solexa 5 05-07-2016 01:10 PM
index contamination in multiplexed run btb Illumina/Solexa 19 12-03-2014 04:35 PM
Low Level Cross-contamination on PGM madebeljak Ion Torrent 1 07-28-2014 12:16 PM
Sample Cross-Contamination MAdkisson General 1 07-31-2012 08:25 AM

Reply
 
Thread Tools
Old 02-28-2015, 08:58 AM   #1
NGSfan
Senior Member
 
Location: Austria

Join Date: Apr 2009
Posts: 181
Default QC approaches for catching multiplexed libraries cross contamination (via adapters)?

Dear fellow Illumina users,

Our wetlab people occassionally cross contaminate multiplexed RNA-seq samples in the same lane, most likely at the adapter ligation stage. So for example, in the same lane we have 8 samples: 4 are liver and 4 are lung. Some highly expressed liver reads are found in the lung samples. We sometimes can catch this during project specific analysis of the samples via gene expression profiles using PCA (eg the outliers), but this is too far downstream in the process from our perspective. We would like to catch this sort of thing much sooner.

We typically sequence RNA-seq projects for three different model organisms (mouse,rat,human) and a variety of tissues (liver,lung,brain,etc).

What would be a strategy for detection? Ideally, I would like to flag the suspect samples at the earliest stage possible. Perhaps taking the top five highest expressing genes per sample in each lane and checking for reads of these genes in the other samples in the same lane? or even across the entire flowcell? This gets tricky though if we have a mix of species on the flowcell.

Perhaps make a three species transcriptome and detect highly expressed contaminant reads with a fast tool like RNA-skim/Sailfish ?

Just trying to brainstorm here a fast and simple way...

Thanks for any input/ideas
NGSfan is offline   Reply With Quote
Old 02-28-2015, 09:44 AM   #2
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

I have been doing a lot of work on cross-contamination-detection lately (paper will be submitted soon!) For detection, I've been using BBSplit and Seal (which is much faster).

For example:
seal.sh ref=mouse.fa,human.fa,rat.fa in=reads.fq refstats=refstats.txt ambig=toss

Refstats will then have 3 lines, each showing the percentage and number of reads that best matched each organism. Seal uses more memory than BBSplit so if you use that it's best to run on transcriptomes rather than whole genomes. The answers they yield are very similar.

Last edited by Brian Bushnell; 02-28-2015 at 09:47 AM.
Brian Bushnell is offline   Reply With Quote
Old 03-01-2015, 04:11 AM   #3
thomasblomquist
Member
 
Location: Ohio

Join Date: Jul 2012
Posts: 68
Default

http://m.nar.oxfordjournals.org/cont...kr771.full.pdf

I had always found this article a good starting point for understanding barcode cross contamination issues.

Personal pearls: run clean up on your specimen after adapter ligation steps to remove from reaction and false addition in down stream steps. Two barcodes per specimen replicate helps tremendously to catch cross contamination of barcodes (see article). Manufacture your indexing barcodes separately by space and time. Manufacturers of Oligos are not the cleanest in their prep methods (IDT has a nice service for this). Let only certain techs do the barcoding steps and final cleanup and mixing of final libraries at the end before sequencing.

Best regards,

Tom
thomasblomquist is offline   Reply With Quote
Old 03-02-2015, 01:26 PM   #4
NGSfan
Senior Member
 
Location: Austria

Join Date: Apr 2009
Posts: 181
Default

Brian, thanks a lot for offering to try your tool! I will take a shot at it.

The example you give would help me to figure out species contamination and that would be very useful in my scenario.

What would you do about contamination of samples coming from the same species, but different tissues (as in the case of highly expressed liver genes found in lung samples ?

Last edited by NGSfan; 03-02-2015 at 02:20 PM.
NGSfan is offline   Reply With Quote
Old 03-02-2015, 01:51 PM   #5
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Right now we are unable to confidently quantify those, in the generic case. If you use dual indexes, use barcodes that are as different as possible, throw away chimeric pairs in which the reads map to different organisms, and avoid as much as possible multiplexing the same organism together, you will reduce the problem... but if the same organism is present in multiple copies, you need to quantify your cross-contamination rates to establish a noise floor. e.g. with 20ppm cross-contamination, if you have 1m-fold coverage of some gene total between all of your libraries, then all of your measurements for that gene in any library will have additional error proportional to 20-fold.

I have a program designed to detect and eliminate this kind of cross-contamination, but it's geared toward assembly, not quantification, so I don't think it would be appropriate.
Brian Bushnell is offline   Reply With Quote
Old 03-02-2015, 02:17 PM   #6
NGSfan
Senior Member
 
Location: Austria

Join Date: Apr 2009
Posts: 181
Default

Quote:
Originally Posted by thomasblomquist View Post
http://m.nar.oxfordjournals.org/cont...kr771.full.pdf

I had always found this article a good starting point for understanding barcode cross contamination issues.

Personal pearls: run clean up on your specimen after adapter ligation steps to remove from reaction and false addition in down stream steps. Two barcodes per specimen replicate helps tremendously to catch cross contamination of barcodes (see article). Manufacture your indexing barcodes separately by space and time. Manufacturers of Oligos are not the cleanest in their prep methods (IDT has a nice service for this). Let only certain techs do the barcoding steps and final cleanup and mixing of final libraries at the end before sequencing.

Best regards,

Tom
Thanks a lot Tom for pointing this article to me - that indeed explains a lot.

I think these are all good suggestions, and I will relay them to the wetlab team. In practice though, I suspect it will be hard to convince the wet lab colleagues to agree to implement the dual barcode approach. Anything that deviates from the standard Illumina protocol tends to get some push back, perhaps because they have optimized the process with the standard kit in mind.

For the foreseeable future, I will have to come up with a a way to detect this contamination (both species and tissue). I will take a shot at this triple species RNAskim/Sailfish approach and see if it pans out. I'll post here on my experience and/or results. I will also try Brian's tool since I can at least detect species contamination with it.

Best,
NGSfan
NGSfan is offline   Reply With Quote
Old 03-02-2015, 02:34 PM   #7
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,076
Default

If the wet lab people want to stick with standard illumina protocols then illumina has dual barcode standard kits. Would those not work?
GenoMax is offline   Reply With Quote
Old 03-03-2015, 12:28 AM   #8
NGSfan
Senior Member
 
Location: Austria

Join Date: Apr 2009
Posts: 181
Default

Quote:
Originally Posted by GenoMax View Post
If the wet lab people want to stick with standard illumina protocols then illumina has dual barcode standard kits. Would those not work?

We are running single end reads 50 bp + 7bp index. I had thought that dual indexes were only possible with paired end reads?

Maybe I'm wrong, I just had a look at this Illumina manual:

http://support.illumina.com/content/...15032071_b.pdf

It looks like it is possible with single end reads as well, no? Page 7 has an example of "Dual-Indexed Single-Read Sequencing".

Hmmm, OK... I will bring this up
NGSfan is offline   Reply With Quote
Old 03-03-2015, 02:51 AM   #9
Chipper
Senior Member
 
Location: Sweden

Join Date: Mar 2008
Posts: 324
Default

Are you sure it is a wet-lab issue? You could try with only perfect matches to the barcode, if it is a synthesis or sequencing problem then you will see contamination only between certain barcodes and are more likely to get them in the 1-mismatch bin.
Chipper is offline   Reply With Quote
Old 03-03-2015, 03:00 AM   #10
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,076
Default

Quote:
Originally Posted by NGSfan View Post
We are running single end reads 50 bp + 7bp index. I had thought that dual indexes were only possible with paired end reads?

It looks like it is possible with single end reads as well, no?

Hmmm, OK... I will bring this up
Dual indexes are certainly possible with SE reads. We run 50x8x8 runs.
GenoMax is offline   Reply With Quote
Old 03-03-2015, 04:10 AM   #11
thomasblomquist
Member
 
Location: Ohio

Join Date: Jul 2012
Posts: 68
Default

Just as a follow up to the above posts. It is certainly easy enough to dual index with single end read (just needs to be long enough to reach the other end and input the barcode). The downside to single end read with this approach, is the Phred score tends to be lower toward the end of the read, and thus the barcode portion may have poor quality assessment.

Even with our dual indexing, we still see cross-contamination between barcodes at about 1:1000,000 reads. But it's a far cry from the 1:300 or 1:1000 for single index.

We also moved to 10 base indexes for both the forward and reverse barcode. We did a simple mutation algorithm to arrive at 24 unique forward and 24 unique reverses that have at least 3 mutations or single base insertion/deletion differences away from each other. This drastically reduced cross-over contamination and informatics confusion.

One large advantage to dual indexing is that we can (when cross-over contamination is less of an issue) perform massive multiplexing (24 x 24 barcodes = 576 specimens). We use this approach when screening for biomarkers amongst 30-50 loci using a single Hiseq lane. It's very cheap.

All the best,

-Tom

Last edited by thomasblomquist; 03-03-2015 at 04:16 AM.
thomasblomquist is offline   Reply With Quote
Old 03-03-2015, 07:43 AM   #12
NGSfan
Senior Member
 
Location: Austria

Join Date: Apr 2009
Posts: 181
Default

GenoMax , thanks a lot for bringing up the ease of using dual barcodes. I will bring this up with the wetlab. I think in the end, this is the real solution. Getting them to adapt to using dual barcodes it will take some political will

I will still try detecting mix ups with the multiple-species alignment approach, just because I know it will be a while before I see changes upstream.

Just to give you are picture, we are not talking about 1-4 reads contaminating, but more like 70-1000 reads in some cases..
NGSfan is offline   Reply With Quote
Old 03-03-2015, 09:27 AM   #13
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Hi Tom,

Are you able to share the barcodes you are using? And also, would you mind clarifying whether "1:1000,000" is "1:1,000,000" or "1:100,000", and also specifically what instrument/mode you are running on?

Thanks!
-Brian
Brian Bushnell is offline   Reply With Quote
Old 03-04-2015, 04:09 AM   #14
thomasblomquist
Member
 
Location: Ohio

Join Date: Jul 2012
Posts: 68
Default

Quote:
Originally Posted by Brian Bushnell View Post
Hi Tom,

Are you able to share the barcodes you are using? And also, would you mind clarifying whether "1:1000,000" is "1:1,000,000" or "1:100,000", and also specifically what instrument/mode you are running on?

Thanks!
-Brian
1 in a million cross-overs. The incidence of having one of the indexes cross-over a mispair with another index/barcode is ~1:1000 (if done optimally). The error rate is multiplicative for the dual errors. i.e. 1:1,000,000 (1,000 x 1,000)

Message me with your email address and I will send you the 10 nucleotide barcode sequences for the 24 forward and 24 reverse.

We use Miseq, Hiseq and even Ion Torrent. Remarkably, we get pretty similar results for our targeted sequencing (amplicon based) approach.

Read more about it here ( http://journals.plos.org/plosone/art...l.pone.0079120 ). In this publication, we had dual index 4 nucleotide barcodes. We have since moved to dual index 10 digit barcode design. Also, the design has been modified a bit to allow for the P5/P7 (illumina), or the ion torrent tails, to be added as the last step so as to allow modularity for which platform we will do sequencing on.


-Tom
thomasblomquist is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:34 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO