Seqanswers Leaderboard Ad

**Brian Bushnell** · 02-28-2015, 10:44 AM

I have been doing a lot of work on cross-contamination-detection lately (paper will be submitted soon!) For detection, I've been using BBSplit and Seal (which is much faster).

For example:
seal.sh ref=mouse.fa,human.fa,rat.fa in=reads.fq refstats=refstats.txt ambig=toss

Refstats will then have 3 lines, each showing the percentage and number of reads that best matched each organism. Seal uses more memory than BBSplit so if you use that it's best to run on transcriptomes rather than whole genomes. The answers they yield are very similar.

**thomasblomquist** · 03-01-2015, 05:11 AM

http://m.nar.oxfordjournals.org/content/early/2011/10/21/nar.gkr771.full.pdf

I had always found this article a good starting point for understanding barcode cross contamination issues.

Personal pearls: run clean up on your specimen after adapter ligation steps to remove from reaction and false addition in down stream steps. Two barcodes per specimen replicate helps tremendously to catch cross contamination of barcodes (see article). Manufacture your indexing barcodes separately by space and time. Manufacturers of Oligos are not the cleanest in their prep methods (IDT has a nice service for this). Let only certain techs do the barcoding steps and final cleanup and mixing of final libraries at the end before sequencing.

Best regards,

Tom

**NGSfan** · 03-02-2015, 02:26 PM

Brian, thanks a lot for offering to try your tool! I will take a shot at it.

The example you give would help me to figure out species contamination and that would be very useful in my scenario.

What would you do about contamination of samples coming from the same species, but different tissues (as in the case of highly expressed liver genes found in lung samples ?

**Brian Bushnell** · 03-02-2015, 02:51 PM

Right now we are unable to confidently quantify those, in the generic case. If you use dual indexes, use barcodes that are as different as possible, throw away chimeric pairs in which the reads map to different organisms, and avoid as much as possible multiplexing the same organism together, you will reduce the problem... but if the same organism is present in multiple copies, you need to quantify your cross-contamination rates to establish a noise floor. e.g. with 20ppm cross-contamination, if you have 1m-fold coverage of some gene total between all of your libraries, then all of your measurements for that gene in any library will have additional error proportional to 20-fold.

I have a program designed to detect and eliminate this kind of cross-contamination, but it's geared toward assembly, not quantification, so I don't think it would be appropriate.

**NGSfan** · 03-02-2015, 03:17 PM

Originally posted by thomasblomquist View Post

http://m.nar.oxfordjournals.org/cont...kr771.full.pdf

I had always found this article a good starting point for understanding barcode cross contamination issues.

Personal pearls: run clean up on your specimen after adapter ligation steps to remove from reaction and false addition in down stream steps. Two barcodes per specimen replicate helps tremendously to catch cross contamination of barcodes (see article). Manufacture your indexing barcodes separately by space and time. Manufacturers of Oligos are not the cleanest in their prep methods (IDT has a nice service for this). Let only certain techs do the barcoding steps and final cleanup and mixing of final libraries at the end before sequencing.

Best regards,

Tom

Thanks a lot Tom for pointing this article to me - that indeed explains a lot.

I think these are all good suggestions, and I will relay them to the wetlab team. In practice though, I suspect it will be hard to convince the wet lab colleagues to agree to implement the dual barcode approach. Anything that deviates from the standard Illumina protocol tends to get some push back, perhaps because they have optimized the process with the standard kit in mind.

For the foreseeable future, I will have to come up with a a way to detect this contamination (both species and tissue). I will take a shot at this triple species RNAskim/Sailfish approach and see if it pans out. I'll post here on my experience and/or results. I will also try Brian's tool since I can at least detect species contamination with it.

Best,
NGSfan

**GenoMax** · 03-02-2015, 03:34 PM

If the wet lab people want to stick with standard illumina protocols then illumina has dual barcode standard kits. Would those not work?

**NGSfan** · 03-03-2015, 01:28 AM

Originally posted by GenoMax View Post

If the wet lab people want to stick with standard illumina protocols then illumina has dual barcode standard kits. Would those not work?

We are running single end reads 50 bp + 7bp index. I had thought that dual indexes were only possible with paired end reads?

Maybe I'm wrong, I just had a look at this Illumina manual:

http://support.illumina.com/content/...15032071_b.pdf

It looks like it is possible with single end reads as well, no? Page 7 has an example of "Dual-Indexed Single-Read Sequencing".

Hmmm, OK... I will bring this up

**Chipper** · 03-03-2015, 03:51 AM

Are you sure it is a wet-lab issue? You could try with only perfect matches to the barcode, if it is a synthesis or sequencing problem then you will see contamination only between certain barcodes and are more likely to get them in the 1-mismatch bin.

**GenoMax** · 03-03-2015, 04:00 AM

Originally posted by NGSfan View Post

We are running single end reads 50 bp + 7bp index. I had thought that dual indexes were only possible with paired end reads?

It looks like it is possible with single end reads as well, no?

Hmmm, OK... I will bring this up

Dual indexes are certainly possible with SE reads. We run 50x8x8 runs.

**thomasblomquist** · 03-03-2015, 05:10 AM

Just as a follow up to the above posts. It is certainly easy enough to dual index with single end read (just needs to be long enough to reach the other end and input the barcode). The downside to single end read with this approach, is the Phred score tends to be lower toward the end of the read, and thus the barcode portion may have poor quality assessment.

Even with our dual indexing, we still see cross-contamination between barcodes at about 1:1000,000 reads. But it's a far cry from the 1:300 or 1:1000 for single index.

We also moved to 10 base indexes for both the forward and reverse barcode. We did a simple mutation algorithm to arrive at 24 unique forward and 24 unique reverses that have at least 3 mutations or single base insertion/deletion differences away from each other. This drastically reduced cross-over contamination and informatics confusion.

One large advantage to dual indexing is that we can (when cross-over contamination is less of an issue) perform massive multiplexing (24 x 24 barcodes = 576 specimens). We use this approach when screening for biomarkers amongst 30-50 loci using a single Hiseq lane. It's very cheap.

All the best,

-Tom

**NGSfan** · 03-03-2015, 08:43 AM

GenoMax , thanks a lot for bringing up the ease of using dual barcodes. I will bring this up with the wetlab. I think in the end, this is the real solution. Getting them to adapt to using dual barcodes it will take some political will

I will still try detecting mix ups with the multiple-species alignment approach, just because I know it will be a while before I see changes upstream.

Just to give you are picture, we are not talking about 1-4 reads contaminating, but more like 70-1000 reads in some cases..

**Brian Bushnell** · 03-03-2015, 10:27 AM

Hi Tom,

Are you able to share the barcodes you are using? And also, would you mind clarifying whether "1:1000,000" is "1:1,000,000" or "1:100,000", and also specifically what instrument/mode you are running on?

Thanks!
-Brian

**thomasblomquist** · 03-04-2015, 05:09 AM

Originally posted by Brian Bushnell View Post

Hi Tom,

Are you able to share the barcodes you are using? And also, would you mind clarifying whether "1:1000,000" is "1:1,000,000" or "1:100,000", and also specifically what instrument/mode you are running on?

Thanks!
-Brian

1 in a million cross-overs. The incidence of having one of the indexes cross-over a mispair with another index/barcode is ~1:1000 (if done optimally). The error rate is multiplicative for the dual errors. i.e. 1:1,000,000 (1,000 x 1,000)

Message me with your email address and I will send you the 10 nucleotide barcode sequences for the 24 forward and 24 reverse.

We use Miseq, Hiseq and even Ion Torrent. Remarkably, we get pretty similar results for our targeted sequencing (amplicon based) approach.

Read more about it here ( http://journals.plos.org/plosone/art...l.pone.0079120 ). In this publication, we had dual index 4 nucleotide barcodes. We have since moved to dual index 10 digit barcode design. Also, the design has been modified a bit to allow for the P5/P7 (illumina), or the ion torrent tails, to be added as the last step so as to allow modularity for which platform we will do sequencing on.

-Tom

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 12 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

QC approaches for catching multiplexed libraries cross contamination (via adapters)?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News