SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
estimating sequencing coverage for multiplexed mrna-seq amango Illumina/Solexa 0 09-13-2011 10:07 PM
Alignment of multiplexed fragment samples (SOLiD) NGS-Jo Bioinformatics 0 01-25-2011 02:36 AM
Anomalies in multiplexed whole transcriptome analysis martian_bob SOLiD 12 06-15-2010 08:10 AM
RNA-Seq: Highly-multiplexed barcode sequencing: an efficient method for parallel anal Newsbot! Literature Watch 0 05-13-2010 03:00 AM
Analysis of Solexa multiplexed data bioinfosm Bioinformatics 2 09-25-2008 11:05 PM

Reply
 
Thread Tools
Old 03-27-2012, 10:03 AM   #1
uattsoancrik
Junior Member
 
Location: Belgium

Join Date: Mar 2012
Posts: 3
Default [help]what's wrong with multiplexed sequencing

Hi, there.
We've have experienced puzzling troubles when we tried to analyze a multiplexed sequencing on Illumina HiSeq platform. Something is wrong with our output results, considering that, we had 7 multiplex libraries together and when we tried Cassava demultiplexing we found "contamination" of results.
I mean that reads expected to belong, i.e., to library #1 are spread everywhere, with nonsense relationships.
It's like a big mosaic of untidy correspondences. We tried to change demultiplexing parameters, without any improvement.
What do you think? Where is the key? In bioinformatics, or worse, it is possible that some biological contamination of reagents occurred? That index-seq primers were accidentally mixed?
It seems far option, but if not what else?

Please, help...

Last edited by uattsoancrik; 03-28-2012 at 01:50 AM.
uattsoancrik is offline   Reply With Quote
Old 03-27-2012, 10:42 AM   #2
amitm
Member
 
Location: Manchester, UK

Join Date: Feb 2011
Posts: 52
Default

how did you know that reads belonging to library #1 are spread out everywhere? Did you map the reads and #1 has reads which are divergent enough to be detected in the rest six libraries?

What demultiplexing parameters you changed? Apart from allowing one mismatch in the index sequence (default 0), I don't remember any other parameter that would alter the demultiplexing output.

The key is the index sequences you document while lib. prep. It seems that there has been cross-contam. If you are sure that thats not the case then can you paste the seq. headers (a couple of..) from all the libraries obtained after demultiplexing. Would help in finding any other cause, if present
amitm is offline   Reply With Quote
Old 03-27-2012, 11:07 AM   #3
twaddlac
Member
 
Location: Pittsburgh, PA

Join Date: Feb 2011
Posts: 49
Default

We experienced this in my lab. We traced it back to the library prep - we mapped the barcodes to their physical location on the 96-well plate to see if there was any cross contamination and there sure was. I would try something like that before indulging too much in the forensics of bioinformatics. It will at least give you an idea of how to assess your data once you find out the source of contamination.

I hope that helps!
twaddlac is offline   Reply With Quote
Old 03-27-2012, 11:43 AM   #4
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

Quote:
Originally Posted by amitm View Post
What demultiplexing parameters you changed? Apart from allowing one mismatch in the index sequence (default 0), I don't remember any other parameter that would alter the demultiplexing output.
You can use less than 6-base indexes. The --use-base parameter is used for this as well as changing the indexes in your sample sheet.

Not that I think it will make any difference in this case. But I have used the above trick when we have had really poor results for some of the index positions.
westerman is offline   Reply With Quote
Old 03-27-2012, 11:57 AM   #5
amitm
Member
 
Location: Manchester, UK

Join Date: Feb 2011
Posts: 52
Default

hi westerman,
you are right. The --use-bases-mask parameter can allow less than six bases as index and some reads can be salvaged if index base quality has been the problem.
thanks for correcting.
amitm is offline   Reply With Quote
Old 03-27-2012, 01:10 PM   #6
epistatic
Senior Member
 
Location: Dronning Maud Land

Join Date: Mar 2009
Posts: 129
Default

I also just had an issue with the index read on two lanes, it looks like some tiles lost to the bottom middle swath problem. It didn't happen in all cycles but I had many more unassigned reads in these two lanes than I should.

With CASAVA 1.8.2 the mismatches allowed for the index read by default is 0. Does anyone think that setting --mismatches 1 would allow less conservative index sorting?
epistatic is offline   Reply With Quote
Old 03-27-2012, 01:15 PM   #7
mnkyboy
Member
 
Location: Seattle, WA

Join Date: Mar 2009
Posts: 87
Default

I have played with changing --mismatches to 1 and have not seen a huge gain. Also two of the indexes are not compatible if you do this. If you use GACGAC and CACGAT with mismatch 1 it will fail, they become GACGAT.

I learned this the hard way.
mnkyboy is offline   Reply With Quote
Old 03-27-2012, 01:29 PM   #8
epistatic
Senior Member
 
Location: Dronning Maud Land

Join Date: Mar 2009
Posts: 129
Default

Great point, I need to see what indexes they have and likely won't waste my time. The BMS issue is annoying enough without it hitting the index read.

I don't have the indexes you mention in the 24 index set and think the Illumina indexes all tolerate at least one-mismatch.

Last edited by epistatic; 03-27-2012 at 02:12 PM.
epistatic is offline   Reply With Quote
Old 03-28-2012, 01:59 AM   #9
uattsoancrik
Junior Member
 
Location: Belgium

Join Date: Mar 2012
Posts: 3
Default

In addition I was thinking that our Index Sequencing Primers were from two different sources. 6 out of 7 were from package A and the other from package B. Do you think this could interfere with results? In which way? And why Illumina released two different versions? Does anybody know what's the difference?
uattsoancrik is offline   Reply With Quote
Old 04-04-2012, 10:10 PM   #10
phoss
Member
 
Location: Beltsville, MD

Join Date: Aug 2011
Posts: 12
Default

Hi uattsoancrik,
We had a similar problem to your original post; saw such results with some multiplexed samples on the GAIIx.
Just curious: the contaminated samples you mentioned originally... are the indexes for these respective samples similar?
Best,

Last edited by phoss; 04-04-2012 at 10:13 PM.
phoss is offline   Reply With Quote
Old 04-05-2012, 07:20 AM   #11
uattsoancrik
Junior Member
 
Location: Belgium

Join Date: Mar 2012
Posts: 3
Default

Hi phoss,
as I said before, our indexes came from Illumina Index Sequencing Primers package A (6 out of 7) and B (1 out of 7). So, studied to be used together as you should expect from a commercial kit.

I'm curious, too, how did you manage your problem, and if you understood what was wrong in your case. Please, let me know. It could be useful discussion. Thx.
uattsoancrik is offline   Reply With Quote
Old 04-05-2012, 08:28 AM   #12
phoss
Member
 
Location: Beltsville, MD

Join Date: Aug 2011
Posts: 12
Default

Hi uattsoancrik,
What I had meant to say in my prior post was whether you see specific DNA patterns in the indexes which map to 'contaminated' samples.

In response to your question: We had the following scenario: Indexes with specific DNA composition patterns yielded high percentage of low-quality reads, while on the contrary, indexes with differing DNA composition yielded high percentage of high-quality reads. What is interesting though is that these if these indexes were found on other lanes, such samples also ended up having low-quality reads.
We debugged the above scenario and reached a number of possible of conclusions such as expired reagents / kit, too-much/to little sample, sequencing of the index, even oil on the slide (which I don't think the HiSeq has; only the GAII if I'm no mistaken).

Hopefully this was helpful nonetheless.
Best,

Last edited by phoss; 04-05-2012 at 08:30 AM.
phoss is offline   Reply With Quote
Old 04-05-2012, 01:33 PM   #13
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,178
Default

Quote:
Originally Posted by phoss View Post
Hi uattsoancrik,
What I had meant to say in my prior post was whether you see specific DNA patterns in the indexes which map to 'contaminated' samples.

In response to your question: We had the following scenario: Indexes with specific DNA composition patterns yielded high percentage of low-quality reads, while on the contrary, indexes with differing DNA composition yielded high percentage of high-quality reads. What is interesting though is that these if these indexes were found on other lanes, such samples also ended up having low-quality reads.
We debugged the above scenario and reached a number of possible of conclusions such as expired reagents / kit, too-much/to little sample, sequencing of the index, even oil on the slide (which I don't think the HiSeq has; only the GAII if I'm no mistaken).

Hopefully this was helpful nonetheless.
Best,
Phoss,

Could you please expand on what you mean by "Indexes with specific DNA composition patterns". We have been tearing our hair out recently because suddenly the quality of our index reads is horrible, leading to massive loss of sequence data because we can't determine the index sequence. This is happening on both our HiSeq2k and GAIIx. We have considered, and tentatively ruled out cluster density and degree of barcode diversity as the source of the problem. Any findings you could share would be greatly appreciated.
kmcarr is offline   Reply With Quote
Old 04-05-2012, 01:41 PM   #14
seb567
Senior Member
 
Location: Québec, Canada

Join Date: Jul 2008
Posts: 260
Default

Quote:
Originally Posted by uattsoancrik View Post
Hi, there.
We've have experienced puzzling troubles when we tried to analyze a multiplexed sequencing on Illumina HiSeq platform. Something is wrong with our output results, considering that, we had 7 multiplex libraries together and when we tried Cassava demultiplexing we found "contamination" of results.
I mean that reads expected to belong, i.e., to library #1 are spread everywhere, with nonsense relationships.
It's like a big mosaic of untidy correspondences. We tried to change demultiplexing parameters, without any improvement.
What do you think? Where is the key? In bioinformatics, or worse, it is possible that some biological contamination of reagents occurred? That index-seq primers were accidentally mixed?
It seems far option, but if not what else?

Please, help...
Greetings,

It is very likely that an error or numerous errors occured during library preparation.

The demultiplexer that ships with CASAVA 1.8.2 is very stringent to avoid altogether contaminations like those you described above.

CASAVA 1.8.2 allows 0 mismatches in each index by default. This can be changed to 1 -- 1 is the maximum number of mismatches in any index for bar-coded data.


At our institution, we developed our own demultiplexer, called FastDemultiplexer, that allows more mismatches, which in turns increases yields and decreases clusters that are unclassified.

I hope you sort out this data confusion although the information you provided indicate erroneous library preparation.


Sébastien Boisvert
^^
seb567 is offline   Reply With Quote
Old 04-05-2012, 01:45 PM   #15
seb567
Senior Member
 
Location: Québec, Canada

Join Date: Jul 2008
Posts: 260
Default

Quote:
Originally Posted by epistatic View Post
I also just had an issue with the index read on two lanes, it looks like some tiles lost to the bottom middle swath problem. It didn't happen in all cycles but I had many more unassigned reads in these two lanes than I should.

With CASAVA 1.8.2 the mismatches allowed for the index read by default is 0. Does anyone think that setting --mismatches 1 would allow less conservative index sorting?
Hi,

Yes, --mismatches 1 is less conservative as it allows 1 mismatch in the first index and 0 in the second, if any.

With CASAVA 1.8.2, the correct invocation is --mismatches 1,1

Using --mismatches 1 is equivalent to using --mismatches 1,0.

See these scripts here.

The default is --mismatches 0,0 which is the same as --mismatches 0


Sébastien Boisvert
seb567 is offline   Reply With Quote
Old 04-05-2012, 01:52 PM   #16
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,178
Default

Quote:
Originally Posted by seb567 View Post
Greetings,

It is very likely that an error or numerous errors occured during library preparation.

The demultiplexer that ships with CASAVA 1.8.2 is very stringent to avoid altogether contaminations like those you described above.

CASAVA 1.8.2 allows 0 mismatches in each index by default. This can be changed to 1 -- 1 is the maximum number of mismatches in any index for bar-coded data.


At our institution, we developed our own demultiplexer, called FastDemultiplexer, that allows more mismatches, which in turns increases yields and decreases clusters that are unclassified.

I hope you sort out this data confusion although the information you provided indicate erroneous library preparation.


Sébastien Boisvert
^^
Sébastien,

Technically the CASAVA pipeline will allow more mismatches (just pass a higher number with the --mistmatches parameter) but it will fail if there is a collision between barcodes caused by allowing multiple mismatches per index. For example:

TruSeq Index #18 == GTCCGC
TruSeq Index #19 == GTGAAA

Now suppose your encounter an index read which is GTCCAA. How do you resolve the conflict of this being #18 with the GC->AA at the end OR #19 with GA->CC in the middle. Unless you very carefully choose the mixture of barcodes you are likely to encounter these types of collisions when allowing more than one error per index read.
kmcarr is offline   Reply With Quote
Old 04-07-2012, 06:07 PM   #17
phoss
Member
 
Location: Beltsville, MD

Join Date: Aug 2011
Posts: 12
Default

Quote:
Could you please expand on what you mean by "Indexes with specific DNA composition patterns". We have been tearing our hair out recently because suddenly the quality of our index reads is horrible, leading to massive loss of sequence data because we can't determine the index sequence. This is happening on both our HiSeq2k and GAIIx. We have considered, and tentatively ruled out cluster density and degree of barcode diversity as the source of the problem. Any findings you could share would be greatly appreciated.
Hi kmcarr,
What we noticed during some of our multiplex-runs (single-end) was that the sequencing of the actual read was great... high quality scores, several million reads, etc. The indexes on the other hand were saturated with N's. Some indexes had the occasional N but as you know, CASAVA has a flag to handle such situations.
Nonetheless, we ran CASAVA demultiplexing on this dataset. The resultant CASAVA-build had very few reads in it simply due to the fact that the indexes are not mapped properly (too many Ns). What we saw is that samples that failed had indexes with a high T and/or G content. Whether it was index-sequencing error or these bases played a role in failed samples.. it's tough to say.
Have you looked at the thumbnails and found any anomalies?
phoss is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:00 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO