SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
converting Ion torrent library for sequencing on HiSeq cwachtel Introductions 4 04-19-2016 04:07 PM
Loss of data in low-diversity libraries can be recovered by deferred cluster calling fkrueger Bioinformatics 17 01-24-2012 06:29 PM
Probability of sequencing low abundant transcripts from non-normalized library go9kata Illumina/Solexa 0 03-10-2011 05:08 AM
Probability of sequencing low abundant transcripts from non-normalized library go9kata Bioinformatics 0 03-08-2011 02:57 AM
Has Anyone stopped a HiSeq run due to low intensity clusters? ashchin Illumina/Solexa 10 01-25-2011 04:03 PM

Reply
 
Thread Tools
Old 11-17-2011, 11:23 AM   #1
Simcom
Junior Member
 
Location: East Coast US

Join Date: Jun 2010
Posts: 8
Default Sequencing a Low diversity library on the HiSeq

I am preparing a custom multiplexed library that will fall into the "low diversity" category. Low diversity meaning the first 5 nucleotides of read 1 will be identical among all clusters. There is a well known and well documented problem with cluster identification for low diversity libraries (outlined here: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3030592/ ).

The above paper and many of the comments on these forums refer specifically to the GAII, and suggest that spiking the library with 40-50% phiX control resolves the cluster calling issue.

Now that the GAIIx has been all but phased out, I need to run my low diversity library on the HiSeq. The problem is that I don't know of anyone that has successfully run a low diversity library on a HiSeq, and my core informed me today that they have tried several times to run low diversity libraries but got awful results on the HiSeq, even after spiking with phiX %50.

My question is, has anyone had success running low a diversity library on the HiSeq? If so, how did you manage to get it to work. Because my study does not require a massive number of reads, I am considering spiking my sample with up to 90-95% gDNA, hopefully drastically increasing the diversity and resolving cluster identification problems. Does anyone have experience running low diversity libraries on the HiSeq that could give me some advice?

Thanks so much!
Simcom is offline   Reply With Quote
Old 11-17-2011, 11:48 AM   #2
HESmith
Senior Member
 
Location: Bethesda MD

Join Date: Oct 2009
Posts: 505
Default

Is there a reason not to use a custom sequencing primer?
HESmith is offline   Reply With Quote
Old 11-17-2011, 12:19 PM   #3
Simcom
Junior Member
 
Location: East Coast US

Join Date: Jun 2010
Posts: 8
Default

I am sequencing viral DNA/gDNA integration junctions, so I am using that 5 nucleotides of viral DNA as a type of 'verification', that indeed a read contains the junction between viral DNA and gDNA, essentially showing that the sequencing primer is not mis-hybridizing to a similar (non-viral) sequence elsewhere in the genome. We are using a custom sequencing primer, but we prefer that it not hybridize up the the very edge of the viral DNA for reason described above.
Simcom is offline   Reply With Quote
Old 11-17-2011, 12:29 PM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,014
Default

Can your core not run your samples by specifying a different lane (which is expected to have "normal" DNA) as the "control" lane for that run?
GenoMax is offline   Reply With Quote
Old 11-17-2011, 12:43 PM   #5
HESmith
Senior Member
 
Location: Bethesda MD

Join Date: Oct 2009
Posts: 505
Default

Simcom: Judicious primer design coupled with the appropriate annealing temperature will virtually assure that the primer does not hybridize inappropriately.

Genomax: the designation of a normal complexity sample as the control lane does not solve the problem (sadly). While it allows the signal thresholds to be set appropriately, it doesn't address the problem of cluster resolution in the low complexity lanes.
HESmith is offline   Reply With Quote
Old 11-17-2011, 12:50 PM   #6
Simcom
Junior Member
 
Location: East Coast US

Join Date: Jun 2010
Posts: 8
Default

Quote:
Originally Posted by GenoMax View Post
Can your core not run your samples by specifying a different lane (which is expected to have "normal" DNA) as the "control" lane for that run?
I think you are misunderstanding the problem. The issue is, because the first 5nt of read #1 are going to be all identical among clusters, and the machine uses these 5nt to call clusters, the machine has a hard time identifying/differentiating between different clusters (especially close overlapping clusters). So the reason for the gDNA is to add diversity to the sequence, allowing clusters to be called. Hence the reason it needs to be included in the same lane as the sample.
Simcom is offline   Reply With Quote
Old 11-17-2011, 12:53 PM   #7
HESmith
Senior Member
 
Location: Bethesda MD

Join Date: Oct 2009
Posts: 505
Default

There's an alternative approach, assuming that you have not yet constructed the libraries. Design them so the junction is at the opposite end of the insert, and perform paired-end sequencing. Cluster calling is based only on the first five cycles of read one, so you'll avoid the low-complexity issue.
HESmith is offline   Reply With Quote
Old 11-17-2011, 12:59 PM   #8
Simcom
Junior Member
 
Location: East Coast US

Join Date: Jun 2010
Posts: 8
Default

Quote:
Originally Posted by HESmith View Post
Simcom: Judicious primer design coupled with the appropriate annealing temperature will virtually assure that the primer does not hybridize inappropriately.
I agree with you that hybridization is unlikely, but if it does happen it will be indistinguishable from an actual integration. Among other things, we are interested in mapping low-abundance integrations, so if we aren't able to get the verification sequence on every read, we will likely need to go in and verify a subset of integrations manually, which may not be possible if an integration is present in only one cell for example.
Simcom is offline   Reply With Quote
Old 11-17-2011, 01:04 PM   #9
Simcom
Junior Member
 
Location: East Coast US

Join Date: Jun 2010
Posts: 8
Default

Quote:
Originally Posted by HESmith View Post
There's an alternative approach, assuming that you have not yet constructed the libraries. Design them so the junction is at the opposite end of the insert, and perform paired-end sequencing. Cluster calling is based only on the first five cycles of read one, so you'll avoid the low-complexity issue.
Yep, exactly. Sadly my boss insisted on having a library that we can do single read OR paired end on (a money saving move potentially), so I had to design the junction on the first read side. And the samples are just about finished being prepped :/
Simcom is offline   Reply With Quote
Old 11-17-2011, 01:05 PM   #10
HESmith
Senior Member
 
Location: Bethesda MD

Join Date: Oct 2009
Posts: 505
Default

If, as you say, you can tolerate discarding 90-95% of the reads, then spiking in a gDNA library at that level will definitely solve your problem. After all, adapter dimers are often present at 5-10% in many libraries (the same % as your desired samples), and sequencing them is not a problem!
HESmith is offline   Reply With Quote
Old 11-17-2011, 01:11 PM   #11
Simcom
Junior Member
 
Location: East Coast US

Join Date: Jun 2010
Posts: 8
Default

Quote:
Originally Posted by HESmith View Post
If, as you say, you can tolerate discarding 90-95% of the reads, then spiking in a gDNA library at that level will definitely solve your problem. After all, adapter dimers are often present at 5-10% in many libraries (the same % as your desired samples), and sequencing them is not a problem!
Thanks, this gives me confidence. Just to be sure though: do the adapter-adapter ligation reads come back in the data, or does the machine throw them out and not include them in sequencing results? If you actually get the adapter-adapter reads from the machine, I should be golden.
Simcom is offline   Reply With Quote
Old 11-17-2011, 01:30 PM   #12
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default

We got it to work on our HiScanSQ -- which uses the same chemistry as the HiSeq, but only scans the top of the flowcell. Not an identical situation, but we had some SMART cDNAs that we sheared and ligated TruSeq adapters on. So about 1/2 of them had the same 50 nt of SMART primer at the beginning. We mixed them 1:1 with a genomic DNA library. Cluster registration went fine.

--
Phillip
pmiguel is offline   Reply With Quote
Old 11-17-2011, 01:32 PM   #13
HESmith
Senior Member
 
Location: Bethesda MD

Join Date: Oct 2009
Posts: 505
Default

Yes, adapter reads are present.

Just remember that you'll also need to include the standard sequencing primer for the gDNA library (or construct the gDNA library with custom adapters to match your custom primer).
HESmith is offline   Reply With Quote
Old 11-17-2011, 01:52 PM   #14
Simcom
Junior Member
 
Location: East Coast US

Join Date: Jun 2010
Posts: 8
Default

Quote:
Originally Posted by HESmith View Post
Yes, adapter reads are present.

Just remember that you'll also need to include the standard sequencing primer for the gDNA library (or construct the gDNA library with custom adapters to match your custom primer).
Yep, I planned on including both primers. Thank you so much for your help, I really appreciate it.
Simcom is offline   Reply With Quote
Old 11-17-2011, 02:00 PM   #15
Simcom
Junior Member
 
Location: East Coast US

Join Date: Jun 2010
Posts: 8
Default

Quote:
Originally Posted by pmiguel View Post
We got it to work on our HiScanSQ -- which uses the same chemistry as the HiSeq, but only scans the top of the flowcell. Not an identical situation, but we had some SMART cDNAs that we sheared and ligated TruSeq adapters on. So about 1/2 of them had the same 50 nt of SMART primer at the beginning. We mixed them 1:1 with a genomic DNA library. Cluster registration went fine.

--
Phillip
OK, that is good to hear. I'm not sure why my core was having trouble spiking 1:1 gDNA.
Simcom is offline   Reply With Quote
Old 11-17-2011, 04:04 PM   #16
BIG_SNP
Member
 
Location: CA

Join Date: Jul 2009
Posts: 14
Default

We have had great success using the NuGen library prep. Their adapters have inline barcodes which adds to the diversity for the first cycles and allows sequences to pass filter. After passing filter the HiSeq can sequence the no or low diversity samples without any problems.
BIG_SNP is offline   Reply With Quote
Old 11-18-2011, 02:32 AM   #17
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 622
Default

Just as another thought, if you could afford to spike in 90-95% gDNA, couldn't you also find an external sequencing facility who still run GAIIx's and use the methods which work well on these?
fkrueger is offline   Reply With Quote
Old 11-18-2011, 04:59 AM   #18
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default

I have seen bad batches of phiX that had fairly high (a few percent, I think) adapter dimers levels in them. Maybe you should make your own genomic DNA library to make sure your "diluent" is of high quality.

Also you could obtain your "diluent" by sub-contracting a sequencing job. Send an email out to a prospective department (maybe one with a high level of plant or fungal sciences being done) and offer a one time only discount genome sequence. Our diluent was a sorghum genomic DNA library.

--
Phillip
pmiguel is offline   Reply With Quote
Old 11-18-2011, 06:55 AM   #19
HMorrison
Senior Member
 
Location: Massachusetts

Join Date: May 2009
Posts: 116
Default

Quote:
Originally Posted by HESmith View Post
There's an alternative approach, assuming that you have not yet constructed the libraries. Design them so the junction is at the opposite end of the insert, and perform paired-end sequencing. Cluster calling is based only on the first five cycles of read one, so you'll avoid the low-complexity issue.

I have a sample of 96-plex low diversity amplicon libraries running now and clusters were found just fine--but the low diversity is causing a tremendous discrepancy between the blue and the green box-and-whiskers plot--raw clusters and clusters passing filter. I hope those data are recoverable at the end. Nothing in my primer design, barcoding, indexing scheme can change the fact that it's "low complexity". First four bases were completely random and followed by eight different in-line bar codes.

This is PE sequencing.

Yet I know labs are making this work.
HMorrison is offline   Reply With Quote
Old 11-18-2011, 07:24 AM   #20
HESmith
Senior Member
 
Location: Bethesda MD

Join Date: Oct 2009
Posts: 505
Default

If the first four bases are random, then subsequent low complexity should not adversely affect cluster calling or data quality. Excessive cluster density is a possible culprit: what are your raw and PF values?
HESmith is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:17 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO