Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • QC approaches for catching multiplexed libraries cross contamination (via adapters)?

    Dear fellow Illumina users,

    Our wetlab people occassionally cross contaminate multiplexed RNA-seq samples in the same lane, most likely at the adapter ligation stage. So for example, in the same lane we have 8 samples: 4 are liver and 4 are lung. Some highly expressed liver reads are found in the lung samples. We sometimes can catch this during project specific analysis of the samples via gene expression profiles using PCA (eg the outliers), but this is too far downstream in the process from our perspective. We would like to catch this sort of thing much sooner.

    We typically sequence RNA-seq projects for three different model organisms (mouse,rat,human) and a variety of tissues (liver,lung,brain,etc).

    What would be a strategy for detection? Ideally, I would like to flag the suspect samples at the earliest stage possible. Perhaps taking the top five highest expressing genes per sample in each lane and checking for reads of these genes in the other samples in the same lane? or even across the entire flowcell? This gets tricky though if we have a mix of species on the flowcell.

    Perhaps make a three species transcriptome and detect highly expressed contaminant reads with a fast tool like RNA-skim/Sailfish ?

    Just trying to brainstorm here a fast and simple way...

    Thanks for any input/ideas

  • #2
    I have been doing a lot of work on cross-contamination-detection lately (paper will be submitted soon!) For detection, I've been using BBSplit and Seal (which is much faster).

    For example:
    seal.sh ref=mouse.fa,human.fa,rat.fa in=reads.fq refstats=refstats.txt ambig=toss

    Refstats will then have 3 lines, each showing the percentage and number of reads that best matched each organism. Seal uses more memory than BBSplit so if you use that it's best to run on transcriptomes rather than whole genomes. The answers they yield are very similar.
    Last edited by Brian Bushnell; 02-28-2015, 10:47 AM.

    Comment


    • #3


      I had always found this article a good starting point for understanding barcode cross contamination issues.

      Personal pearls: run clean up on your specimen after adapter ligation steps to remove from reaction and false addition in down stream steps. Two barcodes per specimen replicate helps tremendously to catch cross contamination of barcodes (see article). Manufacture your indexing barcodes separately by space and time. Manufacturers of Oligos are not the cleanest in their prep methods (IDT has a nice service for this). Let only certain techs do the barcoding steps and final cleanup and mixing of final libraries at the end before sequencing.

      Best regards,

      Tom

      Comment


      • #4
        Brian, thanks a lot for offering to try your tool! I will take a shot at it.

        The example you give would help me to figure out species contamination and that would be very useful in my scenario.

        What would you do about contamination of samples coming from the same species, but different tissues (as in the case of highly expressed liver genes found in lung samples ?
        Last edited by NGSfan; 03-02-2015, 03:20 PM.

        Comment


        • #5
          Right now we are unable to confidently quantify those, in the generic case. If you use dual indexes, use barcodes that are as different as possible, throw away chimeric pairs in which the reads map to different organisms, and avoid as much as possible multiplexing the same organism together, you will reduce the problem... but if the same organism is present in multiple copies, you need to quantify your cross-contamination rates to establish a noise floor. e.g. with 20ppm cross-contamination, if you have 1m-fold coverage of some gene total between all of your libraries, then all of your measurements for that gene in any library will have additional error proportional to 20-fold.

          I have a program designed to detect and eliminate this kind of cross-contamination, but it's geared toward assembly, not quantification, so I don't think it would be appropriate.

          Comment


          • #6
            Originally posted by thomasblomquist View Post
            http://m.nar.oxfordjournals.org/cont...kr771.full.pdf

            I had always found this article a good starting point for understanding barcode cross contamination issues.

            Personal pearls: run clean up on your specimen after adapter ligation steps to remove from reaction and false addition in down stream steps. Two barcodes per specimen replicate helps tremendously to catch cross contamination of barcodes (see article). Manufacture your indexing barcodes separately by space and time. Manufacturers of Oligos are not the cleanest in their prep methods (IDT has a nice service for this). Let only certain techs do the barcoding steps and final cleanup and mixing of final libraries at the end before sequencing.

            Best regards,

            Tom
            Thanks a lot Tom for pointing this article to me - that indeed explains a lot.

            I think these are all good suggestions, and I will relay them to the wetlab team. In practice though, I suspect it will be hard to convince the wet lab colleagues to agree to implement the dual barcode approach. Anything that deviates from the standard Illumina protocol tends to get some push back, perhaps because they have optimized the process with the standard kit in mind.

            For the foreseeable future, I will have to come up with a a way to detect this contamination (both species and tissue). I will take a shot at this triple species RNAskim/Sailfish approach and see if it pans out. I'll post here on my experience and/or results. I will also try Brian's tool since I can at least detect species contamination with it.

            Best,
            NGSfan

            Comment


            • #7
              If the wet lab people want to stick with standard illumina protocols then illumina has dual barcode standard kits. Would those not work?

              Comment


              • #8
                Originally posted by GenoMax View Post
                If the wet lab people want to stick with standard illumina protocols then illumina has dual barcode standard kits. Would those not work?

                We are running single end reads 50 bp + 7bp index. I had thought that dual indexes were only possible with paired end reads?

                Maybe I'm wrong, I just had a look at this Illumina manual:

                http://support.illumina.com/content/...15032071_b.pdf

                It looks like it is possible with single end reads as well, no? Page 7 has an example of "Dual-Indexed Single-Read Sequencing".

                Hmmm, OK... I will bring this up

                Comment


                • #9
                  Are you sure it is a wet-lab issue? You could try with only perfect matches to the barcode, if it is a synthesis or sequencing problem then you will see contamination only between certain barcodes and are more likely to get them in the 1-mismatch bin.

                  Comment


                  • #10
                    Originally posted by NGSfan View Post
                    We are running single end reads 50 bp + 7bp index. I had thought that dual indexes were only possible with paired end reads?

                    It looks like it is possible with single end reads as well, no?

                    Hmmm, OK... I will bring this up
                    Dual indexes are certainly possible with SE reads. We run 50x8x8 runs.

                    Comment


                    • #11
                      Just as a follow up to the above posts. It is certainly easy enough to dual index with single end read (just needs to be long enough to reach the other end and input the barcode). The downside to single end read with this approach, is the Phred score tends to be lower toward the end of the read, and thus the barcode portion may have poor quality assessment.

                      Even with our dual indexing, we still see cross-contamination between barcodes at about 1:1000,000 reads. But it's a far cry from the 1:300 or 1:1000 for single index.

                      We also moved to 10 base indexes for both the forward and reverse barcode. We did a simple mutation algorithm to arrive at 24 unique forward and 24 unique reverses that have at least 3 mutations or single base insertion/deletion differences away from each other. This drastically reduced cross-over contamination and informatics confusion.

                      One large advantage to dual indexing is that we can (when cross-over contamination is less of an issue) perform massive multiplexing (24 x 24 barcodes = 576 specimens). We use this approach when screening for biomarkers amongst 30-50 loci using a single Hiseq lane. It's very cheap.

                      All the best,

                      -Tom
                      Last edited by thomasblomquist; 03-03-2015, 05:16 AM.

                      Comment


                      • #12
                        GenoMax , thanks a lot for bringing up the ease of using dual barcodes. I will bring this up with the wetlab. I think in the end, this is the real solution. Getting them to adapt to using dual barcodes it will take some political will

                        I will still try detecting mix ups with the multiple-species alignment approach, just because I know it will be a while before I see changes upstream.

                        Just to give you are picture, we are not talking about 1-4 reads contaminating, but more like 70-1000 reads in some cases..

                        Comment


                        • #13
                          Hi Tom,

                          Are you able to share the barcodes you are using? And also, would you mind clarifying whether "1:1000,000" is "1:1,000,000" or "1:100,000", and also specifically what instrument/mode you are running on?

                          Thanks!
                          -Brian

                          Comment


                          • #14
                            Originally posted by Brian Bushnell View Post
                            Hi Tom,

                            Are you able to share the barcodes you are using? And also, would you mind clarifying whether "1:1000,000" is "1:1,000,000" or "1:100,000", and also specifically what instrument/mode you are running on?

                            Thanks!
                            -Brian
                            1 in a million cross-overs. The incidence of having one of the indexes cross-over a mispair with another index/barcode is ~1:1000 (if done optimally). The error rate is multiplicative for the dual errors. i.e. 1:1,000,000 (1,000 x 1,000)

                            Message me with your email address and I will send you the 10 nucleotide barcode sequences for the 24 forward and 24 reverse.

                            We use Miseq, Hiseq and even Ion Torrent. Remarkably, we get pretty similar results for our targeted sequencing (amplicon based) approach.

                            Read more about it here ( http://journals.plos.org/plosone/art...l.pone.0079120 ). In this publication, we had dual index 4 nucleotide barcodes. We have since moved to dual index 10 digit barcode design. Also, the design has been modified a bit to allow for the P5/P7 (illumina), or the ion torrent tails, to be added as the last step so as to allow modularity for which platform we will do sequencing on.


                            -Tom

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Strategies for Sequencing Challenging Samples
                              by seqadmin


                              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                              03-22-2024, 06:39 AM
                            • seqadmin
                              Techniques and Challenges in Conservation Genomics
                              by seqadmin



                              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                              Avian Conservation
                              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                              03-08-2024, 10:41 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, Yesterday, 06:37 PM
                            0 responses
                            11 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, Yesterday, 06:07 PM
                            0 responses
                            10 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-22-2024, 10:03 AM
                            0 responses
                            51 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-21-2024, 07:32 AM
                            0 responses
                            68 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X