SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
Multiplexing beyond 96 cheezemeister Illumina/Solexa 10 08-03-2015 01:23 PM
multiplexing 16 samples together ran on 4 lanes or multiplexing 4 samples per lane? zatoichi888 Sample Prep / Library Generation 5 03-30-2014 10:16 AM
MeDIP-seq and compatible DNA adapters for multiplexing? DonDolowy Epigenetics 0 03-22-2013 02:26 AM
Multiplexing attila.szanto RNA Sequencing 2 12-02-2011 08:32 AM
Barcoded PE adapters for multiplexing up to 12 samples Pepe Illumina/Solexa 9 01-28-2010 05:36 PM

Reply
 
Thread Tools
Old 06-25-2015, 08:20 AM   #1
arash82
Member
 
Location: Sweden

Join Date: Oct 2014
Posts: 11
Default Adapters and multiplexing

I am trying to trim adapaters and there is a thing that doesn't quite make sense to me. I've done some simplifications in the description here to focus on the issue. Thanks in advance for any help on the matter.


I've had two samples sequenced with Illumina HiSeq 2000, 50 PE in the same line.

Sample 1 has indexes (barcodes)
Code:
TAAGGCGA and TAGATCGC
Sample 2 has indexes
Code:
GCTACGCT and TAGATCGC
Note that both samples have same index 2.


When I try to trim for the adapters in mate 1 of sample 1, I sometimes find indexes from sample 2. See example below.

Code:
Sample 1, mate 1

@HISEQ2:697:H2NFYBCXX:1:1101:11477:57300 1:N:0:TAAGGCGATAGATCGC
TCTCCGAGCCCACGAGACGCTACGCTATCTCGTATGCCGTCTTCTGCTTGA
+
D@DDBCEHHIIIIHIIIIIIIIIHIDHE?HHHIEEGHHHEHCHFEFG@CHH

As far as I am concerned, this should not happen.

I've tried to read up on the theories on how this work as good as I can, but cannot find a good explanation for this phenomena. Is there a rational explanation to this? What am I missing?

Thanks,
arash82 is offline   Reply With Quote
Old 06-25-2015, 09:59 AM   #2
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

That's quite interesting! You're right, it should not happen.

But there are a few possibilities. 1 is chimeric molecules that have an adapter sequence internally in addition to the adapters on the ends. Perhaps that's more common with Nextera...

Another is that the index and read cycles occur at different times, and not necessarily on the same physical cluster (due to cluster regeneration). IIRC the HiSeq reads both barcodes from the same cluster but I can't remember if it's from the initial cluster or regenerated cluster. Anyway, it's possible that there are two clusters very close together, and one gets assigned the other's index... or something like that. We've been trying to determine exactly what causes cross-contamination (multiplexed samples getting assigned the wrong barcode) for a year without anything absolutely conclusive, but this is a neat piece of evidence.

I'm going to guess chimerism, in this case. How often does this happen, relative to "correct" adapters? Also, with normal adapters, the position of the adapter in read 1 is the same as the position of the adapter in read 2. Are you seeing that in this case? Would you mind posting this reads' mate?
Brian Bushnell is offline   Reply With Quote
Old 06-25-2015, 10:02 AM   #3
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default

You need to check with the sequencing centre, in case there was some mixup when demultiplexing the reads.

How many of the sample1 reads which read far enough into the adapter sequences have the GCTACGCT sequence, and what barcode do you see in the sample 2 reads which have read into the adapter sequences?
mastal is offline   Reply With Quote
Old 06-26-2015, 02:36 AM   #4
arash82
Member
 
Location: Sweden

Join Date: Oct 2014
Posts: 11
Default

Thanks for the reply.

I havenít done the wet part, but I am a bit doubtful regarding chimeric molecules. When the tubes are attached the molecules are in different tubes so I donít see how index 1 would end up in tube 2, which is essential for having two indexes in the same molecule.

I donít know if the fact that both samples have the same index 2 have anything to do with it.

I came to think about the same regarding the second mates. Iíve extracted them for a few of the reads and included at the bottom of the post. They don't seem to have any adapter sequences, so the second option sounds more reasonable to be.

Alos, just doing a line count I get the following results:

Code:
GCTACGCTATCTCGTATGCCGTCTTCTGCTTG    161
TAAGGCGAATCTCGTATGCCGTCTTCTGCTTG    725
That is more than 20%, which is rather concerning. Makes me wonder how many of the mates actually correspond to each other.


Here are the first four hits

Pair 1
Code:
@HISEQ2:697:H2NFYBCXX:1:1101:11477:57300 1:N:0:TAAGGCGATAGATCGC
TCTCCGAGCCCACGAGACGCTACGCTATCTCGTATGCCGTCTTCTGCTTGA
+
D@DDBCEHHIIIIHIIIIIIIIIHIDHE?HHHIEEGHHHEHCHFEFG@CHH

@HISEQ2:697:H2NFYBCXX:1:1101:11477:57300 2:N:0:TAAGGCGATAGATCGC
AGTGAGAGCAGAGATTACAGGACATTGCGAGCAGATTGCGTAGGGACTCTC
+
B@0B@GEHGHEE?@G@EEGH@<CGCC?1@/</?FC?1110<D<011<11C@
Pair 2
Code:
@HISEQ2:697:H2NFYBCXX:1:1101:9774:77082 1:N:0:TAAGGCGATAGATCGC
TCTCCGAGCCCACGAGACGCTACGCTATCTCGTATGCCGTCTTCTGCTTGA
+
DDDDDIIIIIIIIIIIIIIIIIIIIII<<FHIIIIIIIIIIIHIIIIIIII

@HISEQ2:697:H2NFYBCXX:1:1101:9774:77082 2:N:0:TAAGGCGATAGATCGC
AAAGGAAAAGAGCAACTGCTGTGTTGTCCCCACACACACCTGCTCACCTCT
+
###################################################
Pair 3
Code:
@HISEQ2:697:H2NFYBCXX:1:1104:9306:37068 1:N:0:TAAGGCGATAGATCGC
TCTCCGAGCCCACGAGACGCTACGCTATCTCGTATGCCGTCTTCTGCTTGA
+
DDDBDI?EFCHIIIIIIIIHIIIHCEHCCFFHHGHHHIHGHECC<CHHHIH

@HISEQ2:697:H2NFYBCXX:1:1104:9306:37068 2:N:0:TAAGGCGATAGATCGC
AGAATGCACTATGCTTAAGCTCTGACGATTCTTCCGTGCAGCAAGGAGGTC
+
0<00<<1<@1D1<D<<11<1D<@1<10<01<<D110D101111<<1<1<01
Pair 4
Code:
@HISEQ2:697:H2NFYBCXX:1:1105:6252:85097 1:N:0:TAAGGCGATAGATCGC
TCTCCGAGCCCACCGAGACGCTACGCTATCTCGTATGCCGTCTTCTGCTTG
+
BBDD@HHH<EHHHIHIIHHIHEHIGHE=0DCFHIHIEHCECHEE<CHCE?G

@HISEQ2:697:H2NFYBCXX:1:1105:6252:85097 2:N:0:TAAGGCGATAGATCGC
GACTTAAACTACTGAAGGAAAACCTATACCAGCTGCCCAATCTCTGTTACA
+
00000111<<111111111<1110<11<<11<1111<111<<1111<<11<
arash82 is offline   Reply With Quote
Old 06-26-2015, 02:53 AM   #5
arash82
Member
 
Location: Sweden

Join Date: Oct 2014
Posts: 11
Default

Quote:
Originally Posted by mastal View Post
You need to check with the sequencing centre, in case there was some mixup when demultiplexing the reads.

How many of the sample1 reads which read far enough into the adapter sequences have the GCTACGCT sequence, and what barcode do you see in the sample 2 reads which have read into the adapter sequences?
Obviously, I have more than two samples and I feel it is going to be a nightmare investigating all the barcodes in all the samples =S

Just comparing the first mate of these two samples and two barcodes, I get following results:

Samples 1, TAAGGCGA TAGATCGC
Code:
GCTACGCTATCTCGTATGCCGTCTTCTGCTTG    161
TAAGGCGAATCTCGTATGCCGTCTTCTGCTTG    725

Samples 2, GCTACGCT TAGATCGC
Code:
GCTACGCTATCTCGTATGCCGTCTTCTGCTTG     10
TAAGGCGAATCTCGTATGCCGTCTTCTGCTTG    513
arash82 is offline   Reply With Quote
Old 06-26-2015, 03:30 AM   #6
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default

How many samples were run in the same lane, and did all the samples have the same barcode for read2?

How many reads do you have for samples 1 and 2, what percentage of the total reads do the counts you showed represent?

Reagents can always become contaminated, but it seems a bit strange that read1 and read2 would show a different sequence than the runs where they read the barcodes.
mastal is offline   Reply With Quote
Old 06-26-2015, 06:38 AM   #7
arash82
Member
 
Location: Sweden

Join Date: Oct 2014
Posts: 11
Default

Quote:
Originally Posted by mastal View Post
How many samples were run in the same lane, and did all the samples have the same barcode for read2?
I have a total of 17 samples, 12 different index 1 and 8 different index 2. Samples are all mixed and ran in 4 lanes.

Quote:
Originally Posted by mastal View Post
How many reads do you have for samples 1 and 2, what percentage of the total reads do the counts you showed represent?
The numbers I gave are just line counts. To get a real representation, I need to to use cutadapt or something that allows for miss match and truncations etc. to give a proper representation. However, I don't believe that this is the issue at the moment...


Quote:
Originally Posted by mastal View Post
Reagents can always become contaminated, but it seems a bit strange that read1 and read2 would show a different sequence than the runs where they read the barcodes.
Currently running trough all of the samples. So far it apperas the contamination is consistent between samples using one of the same barcodes...
arash82 is offline   Reply With Quote
Old 06-26-2015, 07:10 AM   #8
arash82
Member
 
Location: Sweden

Join Date: Oct 2014
Posts: 11
Default

Processing all the data, it seems the contamination only exists among samples that have one of the two barcodes the same.

For instance, sample 1 and 2 both have the same index 2, but different index 1. Both these two indexes are found in sample 1 and 13. This is true for every sample I have tested. Likewise when same index 2 is shared between 3 samples, you will find the three different index 1 in all three samples.

I am starting to fear this could be quite an issue. Do people usually multiplex their sample in similar way? What is your experience?
arash82 is offline   Reply With Quote
Old 06-26-2015, 07:20 AM   #9
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default

There is nothing wrong with the way the samples have been multiplexed. With 12 barcodes for the first index and 8 barcodes for the second index you could multiplex up to 96 different samples.
mastal is offline   Reply With Quote
Old 06-26-2015, 10:27 AM   #10
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

The adapter always appears in the same position in read 1, and does not appear at all in read 2. This is not simple misassignment or an error in demultiplexing. It looks like you had free unligated adapters from sample 1 floating around that attached themselves to fully-ligated reads from sample 2. But, I don't think Nextera works that way, so I don't really know what's going on. Maybe Illumina would have an idea, if you contacted them.
Brian Bushnell is offline   Reply With Quote
Old 06-26-2015, 11:20 AM   #11
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,015
Default

@arash82: Probably time to go back to the experimental people with your observations. No logical informatics based explanation seems to apply here. These libraries may need to be re-made from scratch, provided starting material is of good quality.
GenoMax is offline   Reply With Quote
Old 06-26-2015, 02:06 PM   #12
SNPsaurus
Registered Vendor
 
Location: Eugene, OR

Join Date: May 2013
Posts: 516
Default

Brian, I was thinking the same thing. The Nextera mate pair protocol actually uses tagmentation followed by end repair and ligation of TruSeq adapters.

When were the samples pooled in the protocol?
__________________
Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com
SNPsaurus is offline   Reply With Quote
Old 07-27-2015, 07:52 AM   #13
arash82
Member
 
Location: Sweden

Join Date: Oct 2014
Posts: 11
Default

@SNPsaurus: All the libraries are made individually. Then the samples are pooled and sequenced.

@GenoMax: Core facility cannot figure it out either. Very unlikely we would do this from scratch. There is no RNA left, and getting new samples is too much time and resources. We do have indexed libraries though, so pooling in a different way is possible.

I have a feeling Illumina is very defensive in their communication. Their last input on the matter is "material handling error during some point of library construction". I was thinking this could happen if you use the same pipet tips when preparing for adapter/indexes.

Doubt it though, as the core facility is doing this routinely. Also that would create contamination in one way (i.e. tip into sample 1 and then sample 2 would create contamination only in sample 2), but I see contamination consistently in all, and only, samples that share index.

Sh*t happens and sometimes you cannot explain it. What is puzzling though is the fact that others have seen similar things. Something is fishy and I need to find out what
arash82 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:28 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO