SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa

Similar Threads
Thread Thread Starter Forum Replies Last Post
Pathway analysis recommendations? Pesto Bioinformatics 5 05-25-2012 11:55 AM
Read depth recommendations dpryan RNA Sequencing 2 09-30-2011 12:15 PM
Recommendations for sequencing facility crh Core Facilities 9 06-15-2011 02:13 PM
recommendations for dot plot viewers? Zimbobo Bioinformatics 1 09-24-2010 02:13 PM
recommendations on disk storage infrastructure mkeehan Bioinformatics 2 07-14-2010 07:29 AM

Reply
 
Thread Tools
Old 03-16-2009, 04:15 PM   #1
greigite
Senior Member
 
Location: Cambridge, MA

Join Date: Mar 2009
Posts: 141
Question recommendations on multiplex?

Hi,
I'm looking for recommendations on/your experiences with Illumina multiplexing. I plan to sequence 11 isolates for assembly against an existing reference genome, and would like to minimize the run cost by multiplexing them. Illumina offers a 12-sample multiplexing kit, but I've also looked at the paper by Cronn et al using 3 bp tags. The 3 bp tags used there don't seem as appealing to me because a single error can cause misidentification. Can anyone here advise on what they've found to be the best multiplexing method? Go with the kit or use custom 4-6 nt barcodes?

I've seen the thread on the google Solexa group (http://groups.google.com/group/solex...1ba2313d65332c) , but that is dated last fall before the kit came out.
greigite is offline   Reply With Quote
Old 03-28-2009, 11:28 PM   #2
treebeard
Junior Member
 
Location: Oregon

Join Date: Feb 2008
Posts: 5
Default Recommendations on multiplex - update

Hi Greigite -

Our lab group has been using the same, simple 3' barcode method outlined in the Cronn et al. paper, only with slightly longer barcodes (3 barcode nucleotides + 1 terminal "T" = 4 bp), and the paired-end adapter sequences described in Bentley et al. (Nature, 2008).

We designed 20 of the 64 possible combinations, and we routinely use them for 6-8 plex reactions. We've even run multiplexes as high as 16-plex four different times. I don't have the updated numbers, but at last check we've run over 200 separate samples at an average multiplex level of 7X. Here are the exact results from a recent multiplex:

sample a = 16-plex; 85% reads correctly tagged
sample b = 16-plex; 87% reads correctly tagged
sample c = 14-plex; 86% reads correctly tagged

We do tend to lose about 10-15% of the reads (for a variety of reasons) using the home-made kit. We have not tested the Illumina barcoding kit, but I imagine their product is sufficiently polished that this is less of an issue.

The method definitely works for single-end reads (successful results with 36 to 60 bp reads) and paired-end reads (36 x 36).
treebeard is offline   Reply With Quote
Old 03-30-2009, 08:26 AM   #3
lparsons
Member
 
Location: NJ

Join Date: Nov 2008
Posts: 28
Default

Does anyone have any advice on the bioinformatics side of dealing with barcoded data? Specifically, we are using the Illumina Pipeline 1.3.2. The obvious simple solution is to separate the reads by grepping for the code a the start of the read and feeding each separately into ELAND/MAQ/etc. Does anyone have any other suggestions or tips for modifying the pipeline to handle this? Thanks.
lparsons is offline   Reply With Quote
Old 04-06-2009, 04:06 PM   #4
greigite
Senior Member
 
Location: Cambridge, MA

Join Date: Mar 2009
Posts: 141
Default

Thanks, Treebeard. I am thinking of using the 6 bp (actually 5 bp plus the overhanging T) described in the Craig et al paper to get a better check on per-base error rates. How do you purify your oligos (desalt, HPLC)? Do you see any tag-specific underrepresentation?

Quote:
Originally Posted by treebeard View Post
Hi Greigite -

Our lab group has been using the same, simple 3' barcode method outlined in the Cronn et al. paper, only with slightly longer barcodes (3 barcode nucleotides + 1 terminal "T" = 4 bp), and the paired-end adapter sequences described in Bentley et al. (Nature, 2008).

We designed 20 of the 64 possible combinations, and we routinely use them for 6-8 plex reactions. We've even run multiplexes as high as 16-plex four different times. I don't have the updated numbers, but at last check we've run over 200 separate samples at an average multiplex level of 7X. Here are the exact results from a recent multiplex:

sample a = 16-plex; 85% reads correctly tagged
sample b = 16-plex; 87% reads correctly tagged
sample c = 14-plex; 86% reads correctly tagged

We do tend to lose about 10-15% of the reads (for a variety of reasons) using the home-made kit. We have not tested the Illumina barcoding kit, but I imagine their product is sufficiently polished that this is less of an issue.

The method definitely works for single-end reads (successful results with 36 to 60 bp reads) and paired-end reads (36 x 36).
greigite is offline   Reply With Quote
Old 04-07-2009, 11:43 AM   #5
wraithnot
Member
 
Location: SF bay area

Join Date: Apr 2009
Posts: 12
Default Multiplexing strategy

This is right up my alley so this seems like a good opportunity to make my first post on this board.

An elegant way to design error correcting bar codes can be found in Hamady et al. Nature Methods v5 p235-237 2008. But this was developed for 454 pyrosequencing and uses 8 bp tags that were a bit too long for our Solexa/Illumina application.

I opted for a simpler scheme that identified the sample in the first part of the barcode and then adds a "checksum" base that allows you to detect any single mutations in the barcode and then throw out the entire sequence rather than assigning the sequence to the wrong sample.

I assigned A = 0, C = 1, G = 2, T = 3 and used base 4 arithmetic. For example, the sample ID 94 in base 10 becomes 01132 in base 4 (0x256 + 1x64 + 1x16 + 3x4 + 2x1 = 94) which converts to ACCTG. The sum of each of these 5 digits is 1+1+3+2 = 7 (in base 10) and the remainder when divided by 4 is 3. This remainder is converted to a base (3 = T) and added to the end of the 5 bp portion. Thus sample id 94 would be converted to ACCTGT and any single mutation to any of these 6 bases will create a checksum base that doesn't match the rest of the barcode.

These barcode sequences can be generated by a simple python script. The barcodes can be decoded and verified using an equally simple script and then the sequences with valid barcodes can be separated by which sample they belong to.

This 6 bp scheme can be used with up to 4^5 = 1024 samples. We have used it to great success for multiplexing 96 samples and we are planning to test it with 384 multiplexed samples. This barcode with checksum base can be shortened if you don’t need to multiplex as many samples. For example, a 3 bp barcode could multiplex up to 16 samples, a 4 bp barcode could multiplex up to 64 samples and a 5 bp barcode could multiplex up to 256 samples.

Quote:
Originally Posted by treebeard View Post
Hi Greigite -

Our lab group has been using the same, simple 3' barcode method outlined in the Cronn et al. paper, only with slightly longer barcodes (3 barcode nucleotides + 1 terminal "T" = 4 bp), and the paired-end adapter sequences described in Bentley et al. (Nature, 2008).

We designed 20 of the 64 possible combinations, and we routinely use them for 6-8 plex reactions. We've even run multiplexes as high as 16-plex four different times. I don't have the updated numbers, but at last check we've run over 200 separate samples at an average multiplex level of 7X. Here are the exact results from a recent multiplex:

sample a = 16-plex; 85% reads correctly tagged
sample b = 16-plex; 87% reads correctly tagged
sample c = 14-plex; 86% reads correctly tagged

We do tend to lose about 10-15% of the reads (for a variety of reasons) using the home-made kit. We have not tested the Illumina barcoding kit, but I imagine their product is sufficiently polished that this is less of an issue.

The method definitely works for single-end reads (successful results with 36 to 60 bp reads) and paired-end reads (36 x 36).
wraithnot is offline   Reply With Quote
Old 04-08-2009, 06:04 AM   #6
mhc
Junior Member
 
Location: Boston

Join Date: Jun 2008
Posts: 2
Default

Great posts. Wraithnot and treebeard, what's been your experience with read representation between the different barcodes? Is the difference between reads per barcode within a 2-fold? 5-fold?
mhc is offline   Reply With Quote
Old 04-08-2009, 12:32 PM   #7
wraithnot
Member
 
Location: SF bay area

Join Date: Apr 2009
Posts: 12
Default

Quote:
Originally Posted by mhc View Post
Great posts. Wraithnot and treebeard, what's been your experience with read representation between the different barcodes? Is the difference between reads per barcode within a 2-fold? 5-fold?
Our experiments use 96 separate barcodes. If you aren't careful about normalizing the concentrations of each sample when you combine them you can get wildly different amounts of each barcoded sample. For a well behaved experiment the most represented barcode was represented less than two-fold higher than the median, and the 10% percentile barcode was represented about 2-fold less than the median. The very bottom of the distribution had a discontinuity and the least represented barcode was 20-fold down from the median. This might be due to an oligo quality issue for that particular barcode.
wraithnot is offline   Reply With Quote
Old 04-09-2009, 04:38 PM   #8
greigite
Senior Member
 
Location: Cambridge, MA

Join Date: Mar 2009
Posts: 141
Default

It's a bit of a side issue to this discussion, but I've gotten some different advice on necessary purification levels for home-made adapters. One core I talked to uses HPLC for the adapter with the phosphorothioate bond to ensure a higher frequency of ligation. Another person uses desalt for all adapters, and the Quail et al Sanger paper uses HPLC for all theirs. Just FYI. I have no experience yet myself.
greigite is offline   Reply With Quote
Old 04-18-2009, 07:31 PM   #9
Wade Davis
Junior Member
 
Location: Columbia, MO

Join Date: Mar 2009
Posts: 2
Default

Quote:
Originally Posted by wraithnot View Post
Our experiments use 96 separate barcodes. If you aren't careful about normalizing the concentrations of each sample when you combine them you can get wildly different amounts of each barcoded sample. For a well behaved experiment the most represented barcode was represented less than two-fold higher than the median, and the 10% percentile barcode was represented about 2-fold less than the median. The very bottom of the distribution had a discontinuity and the least represented barcode was 20-fold down from the median. This might be due to an oligo quality issue for that particular barcode.

Wraitnot, I am very interested in your barcoding results. I would like to talk you offline about this if possible. I've sent you a private message with my contact information.

Thanks,
Wade
Wade Davis is offline   Reply With Quote
Old 05-06-2009, 09:19 AM   #10
csquared
Member
 
Location: Huntsville, AL

Join Date: May 2008
Posts: 67
Default

This thread is a few weeks old and I have been away from the board for a while but I wanted to reply and make an offer for any interested labs to try out our indexing methods. In our hands, we are seeing at least 98% efficiency in the parsing of the final reads to the correct barcodes so it is working really well. We also have some software that goes with the adaptors to correctly parse the final data. We are routinely doing a 12-plex but the designs are finished for 96-plex. We just haven't had a project that needs that high a multiplex.

The method has all the usual features such as error correction (SECDEC to be specific). Reply or PM me if interested. It would be great to compare results with a few different methods/designs.
csquared is offline   Reply With Quote
Old 06-12-2009, 07:04 PM   #11
sparks
Senior Member
 
Location: Kuala Lumpur, Malaysia

Join Date: Mar 2008
Posts: 126
Default

Hi,
There's a demultiplexing tool, novobarcode, in the Novoalign package that's free to use. It can take a file of index tags and distance (bp differences) between tags and then demux the reads into individual files. The current program expects the index tag to be part of the read rather than Illumina format in the header which should suit your custom tags. A new version is coming that supports Illumina format.
Alignment of index tags and reads uses bases qualities and allows some differences depending on the distance setting and a quality threshold.
Colin
sparks is offline   Reply With Quote
Old 06-23-2009, 10:17 AM   #12
PUGenomeLab
Junior Member
 
Location: West Lafayette, IN

Join Date: Jun 2009
Posts: 2
Default Illumina tags

Hello,
I'm new to this site, but I wanted get feedback on a couple of ideas that our lab is tossing around.
1) Illumina's multiplexing adapters contain ID tags on the end of the fragment. Does this lead to any significant loss of the tag or misreading of the tag during sequencing?
2) We wondered if anyone has ever moved the tag so that it would essentially be the first 3/4 base pairs of the read? This would cut down on the read length, but it would make tags that will not be lost and could bypass any additional cost that sequencing centers charge for multiplexing. The decoding of the tags would be done by our computational people, thus reducing the cost.

I'd appreciate feedback or suggestions. Thanks.
PUGenomeLab is offline   Reply With Quote
Old 07-07-2009, 09:16 AM   #13
Exeplex
Member
 
Location: Exeter, UK

Join Date: Jun 2009
Posts: 10
Default Multiplexing with custom adapters

Quote:
Originally Posted by PUGenomeLab View Post
Hello,
I'm new to this site, but I wanted get feedback on a couple of ideas that our lab is tossing around.
1) Illumina's multiplexing adapters contain ID tags on the end of the fragment. Does this lead to any significant loss of the tag or misreading of the tag during sequencing?
2) We wondered if anyone has ever moved the tag so that it would essentially be the first 3/4 base pairs of the read? This would cut down on the read length, but it would make tags that will not be lost and could bypass any additional cost that sequencing centers charge for multiplexing. The decoding of the tags would be done by our computational people, thus reducing the cost.

I'd appreciate feedback or suggestions. Thanks.
Hi,

Re. the second part of this post, various groups (e.g. Cronn et al., NAR 2008; Harismendy & Frazer, Biotechniques 2009) have used custom adapters incorporating the barcode so that it becomes part of the read at the 5' end. In a bid to get my own head around the various stages, I've adapted the excellent figure posted by greigite (in the Tech Summary: Illumina's Solexa Sequencing Technology thread) to include some details of the approach used by Cronn et al., as well as a generic strategy to follow through the PCR and sequencing steps (any errors, please let me know!) - hope this is useful.

Also, check out the other posts in this thread for great ideas re. barcode design and bioinformatic processing of sequences.

Exeplex
Attached Files
File Type: pdf Multiplexing on the Illumina GA with custom adapters.pdf (39.3 KB, 945 views)
Exeplex is offline   Reply With Quote
Old 10-02-2009, 03:32 PM   #14
ShiveringFire
Junior Member
 
Location: Athens, GA

Join Date: Oct 2009
Posts: 7
Default Library-free addition of index sequences in circularized MIP captured sequences?

I am trying to come up with a library-free way to introduce index (barcode) sequences into circularized captured genome fragments (Molecular Inverted Probes – MIP) that are about 250 nt long. The MIP designs are described in Porreca et al. 2007 and in Turner et al 2009:
http://www.nature.com/nmeth/journal/...nmeth1110.html
http://www.nature.com/doifinder/10.1038/nmeth.f.248

The authors ran their 16 samples in separate lanes, therefore did not need to index. The 100-mer oligo MIP designs has a 30 nt common linker sequence, so that one can do inverse pcr using Illumina paired end primers directed against this common linker and then load into flowcell directly for cluster generation.

I like the approach a lot, but couldn’t yet find a way to append index sequences without the standard library prep using adapters described in Cronn et al. or Craig et al. mentioned in this thread.

Any thoughts?
ShiveringFire is offline   Reply With Quote
Old 10-05-2009, 03:35 PM   #15
Exeplex
Member
 
Location: Exeter, UK

Join Date: Jun 2009
Posts: 10
Default Library-free addition of index sequences in circularized MIP captured sequences?

Quote:
Originally Posted by ShiveringFire View Post
I am trying to come up with a library-free way to introduce index (barcode) sequences into circularized captured genome fragments (Molecular Inverted Probes – MIP) that are about 250 nt long. The MIP designs are described in Porreca et al. 2007 and in Turner et al 2009:
http://www.nature.com/nmeth/journal/...nmeth1110.html
http://www.nature.com/doifinder/10.1038/nmeth.f.248

The authors ran their 16 samples in separate lanes, therefore did not need to index. The 100-mer oligo MIP designs has a 30 nt common linker sequence, so that one can do inverse pcr using Illumina paired end primers directed against this common linker and then load into flowcell directly for cluster generation.

I like the approach a lot, but couldn’t yet find a way to append index sequences without the standard library prep using adapters described in Cronn et al. or Craig et al. mentioned in this thread.

Any thoughts?
I haven't tried this, but in principle you should be able to carry out direct indexed sequencing from MIP-derived amplicons by incorporating a barcode into the middle of the reverse primer (between the linker-specific sequence and the flowcell oligo annealing sequence). You can then mimic the Illumina multiplex/indexing strategy by carrying out a 2nd short sequence read from the reverse primer to read the index - see the attached PDF for details (you might need to lengthen the common linker sequence slightly so as to make a longer primer for the index read).
The example shown is based on the strategy of Turner et al, who used single-end sequencing. The indexing strategy should work equally well for paired-end reads (carrying out the index read after the first sequencing reaction, before 'flipping' the product on the PE module and performing the reverse read of the captured sequence, as in the Illumina multiplex protocol), but you would just need to bear in mind that the flow cell oligos C & D on the PE flow cell are slightly longer, so you would need to adjust the adapter/primer tails accordingly.
Attached Files
File Type: pdf Barcoding and direct sequencing of MIPs.pdf (157.1 KB, 343 views)
Exeplex is offline   Reply With Quote
Old 10-15-2009, 10:47 AM   #16
ShiveringFire
Junior Member
 
Location: Athens, GA

Join Date: Oct 2009
Posts: 7
Default

Quote:
Originally Posted by Exeplex View Post
The indexing strategy should work equally well for paired-end reads (carrying out the index read after the first sequencing reaction, before 'flipping' the product on the PE module and performing the reverse read of the captured sequence, as in the Illumina multiplex protocol), but you would just need to bear in mind that the flow cell oligos C & D on the PE flow cell are slightly longer, so you would need to adjust the adapter/primer tails accordingly.
Exeplex,
I have been staring at your excellent illustration and it led me to generate more questions on Paired-End multiplexing.
During inverse pcr, does the introduction of index sequences to the forward/reverse primers require modification of the PE capture sequencing primers? That is, if one multiplexes 48 samples does s/he have to use 48 different capture sequencing primers? If that is the case, the only solution I could come up with is to shorten the PE sequencing primers into 20-mer and 19-mers so that a single set of primers used (see my illustration attached). However, I am not sure how flexible the illumina chemistry is for this.
Attached Files
File Type: pdf IlluminaPEmultiplexTurner.pdf (13.8 KB, 366 views)
ShiveringFire is offline   Reply With Quote
Old 10-23-2009, 06:34 PM   #17
silin284
Member
 
Location: ny

Join Date: Jul 2009
Posts: 23
Default

Quote:
Originally Posted by ShiveringFire View Post
I am trying to come up with a library-free way to introduce index (barcode) sequences into circularized captured genome fragments (Molecular Inverted Probes – MIP) that are about 250 nt long. The MIP designs are described in Porreca et al. 2007 and in Turner et al 2009:
http://www.nature.com/nmeth/journal/...nmeth1110.html
http://www.nature.com/doifinder/10.1038/nmeth.f.248

The authors ran their 16 samples in separate lanes, therefore did not need to index. The 100-mer oligo MIP designs has a 30 nt common linker sequence, so that one can do inverse pcr using Illumina paired end primers directed against this common linker and then load into flowcell directly for cluster generation.

I like the approach a lot, but couldn’t yet find a way to append index sequences without the standard library prep using adapters described in Cronn et al. or Craig et al. mentioned in this thread.

Any thoughts?
Great idea. Some problem might come from the 30 nt common linker.
1. It could generate bias to DNA with similar sequence (or a particular hairpin structure).
2. It also put a limit on the size of DNA sample.
silin284 is offline   Reply With Quote
Old 10-26-2009, 10:31 AM   #18
Exeplex
Member
 
Location: Exeter, UK

Join Date: Jun 2009
Posts: 10
Default Library-free addition of index sequences in circularized MIP captured sequences?

Hi ShiveringFire,

In my diagram the index is introduced by the reverse primer in the inverse PCR step (called the Capture_Slxa_Rev_Amp primer in the Turner paper), so you would need a unique reverse primer for each index sequence you wanted to use. The forward PCR primer and the sequencing primer are common in all reactions - you need to use a sequencing primer that anneals to the common linker so that this sequence does not end up in your reads, as it would if you used the standard Illumina primer. However, I don't think there's anything special about the chemistry of the sequencing primer, so you can just make a custom one to anneal wherever you need it (as Turner did).

The post above also makes some interesting points, particularly about the insert size - I'm not aware that the MIP strategy has been shown to work on inserts above 120bp, and it's interesting that Jay Shendure's most recent paper made use of arrays rather than MIPs for whole exome sequencing. One other potential problem you might want to consider at an early stage - Agilent no longer seem to make programmable microarrays for the length of oligos you would need, and I'm not aware of an alternative supplier but would welcome any info on this as I need one for a different project!
Exeplex is offline   Reply With Quote
Old 10-27-2009, 10:57 AM   #19
ShiveringFire
Junior Member
 
Location: Athens, GA

Join Date: Oct 2009
Posts: 7
Default

Thank you all.
Quote:
Originally Posted by Exeplex View Post
Agilent no longer seem to make programmable microarrays for the length of oligos you would need, and I'm not aware of an alternative supplier but would welcome any info on this as I need one for a different project!
We got our 12.000 oligopool (all 100-mers) synthesized by Combimatrix: http://www.combimatrix.com/ Our MIPs will target inserts less than 190bp. We are expecting 75nt paired-end reads from both ends.
ShiveringFire is offline   Reply With Quote
Old 10-27-2009, 03:53 PM   #20
Corey
Junior Member
 
Location: Vancouver, British Columbia

Join Date: Oct 2009
Posts: 5
Default

Hi all,
Corey from University of Toronto here. Have followed the great posts here for several months and just wanted to introduce myself. Regarding this multiplex thread, we have recently published a simple 2 step protocol that works great for counting applications. http://genome.cshlp.org/content/19/10/1836.long
Corey is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:05 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO