SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
converting Ion torrent library for sequencing on HiSeq cwachtel Introductions 4 04-19-2016 04:07 PM
Loss of data in low-diversity libraries can be recovered by deferred cluster calling fkrueger Bioinformatics 17 01-24-2012 06:29 PM
Probability of sequencing low abundant transcripts from non-normalized library go9kata Illumina/Solexa 0 03-10-2011 05:08 AM
Probability of sequencing low abundant transcripts from non-normalized library go9kata Bioinformatics 0 03-08-2011 02:57 AM
Has Anyone stopped a HiSeq run due to low intensity clusters? ashchin Illumina/Solexa 10 01-25-2011 04:03 PM

Reply
 
Thread Tools
Old 11-18-2011, 10:05 AM   #21
csquared
Member
 
Location: Huntsville, AL

Join Date: May 2008
Posts: 67
Default

We do similar things to what you are describing all the time. A 20-30% PhiX spike (or any other library) should do the trick. PhiX is easy since it can be easily removed without an index and you can monitor the percent alignment as the run is going.

Our most common condition is a HiC or 5C library where we need to get through some T3 and T7 sequences that are common to all of the samples. We have used both ChIPseq libraries as well as PhiX spikes with very good results. The use of the ChIPseq libraries just allows those reads to be used for something useful where PhiX is just data thrown away.

Add in a spike and lower cluster density on the HiSeq to the 500k to 600k range and you should be fine. If you want to avoid the spike altogether, lowering clusters to about 200k also works but with more variable results.
__________________
HudsonAlpha Institute for Biotechnology
http://www.hudsonalpha.org/gsl
csquared is offline   Reply With Quote
Old 11-18-2011, 02:04 PM   #22
greigite
Senior Member
 
Location: Cambridge, MA

Join Date: Mar 2009
Posts: 141
Default

For what it's worth we sequenced two lanes on a HiSeq (v2 flow cell) containing 11 7 bp inline barcodes with a spike-in of 5% phiX. Despite the low diversity visible in fastqc plots for the first 7 bp our cluster density and % of clusters passing filter was comparable to other lanes on the same run that did not have any low-complexity issues.
greigite is offline   Reply With Quote
Old 11-20-2011, 03:47 PM   #23
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default

Quote:
Originally Posted by HESmith View Post
If the first four bases are random, then subsequent low complexity should not adversely affect cluster calling or data quality. Excessive cluster density is a possible culprit: what are your raw and PF values?
Semantically "random" does not mean "high complexity". See.

Point being we presume "random" here means all four bases were a random mix of ACGT and therefore an even mixture of all 256 possible sequences. But we would need to know what the method of generating these was to know.

--
Phillip
pmiguel is offline   Reply With Quote
Old 11-21-2011, 09:36 AM   #24
HESmith
Senior Member
 
Location: Bethesda MD

Join Date: Oct 2009
Posts: 505
Default

random (adj.) - lacking a definite plan, purpose, or pattern (emphasis added).

The poster stated that the "first four bases were completely random". I assumed (not "presumed") he meant what he wrote :-).

Semantically, "high complexity" does not mean base composition diversity, which is the relevant issue for cluster calling. A library that consists solely of AAAAA, CCCCC, GGGGG, or TTTTT starts (in roughly equal amounts) would suffice, yet (almost) no one would argue that this constitutes high complexity.

Apologies if this message comes across as cranky, Phillip. I was just trying my best to help the poster, and don't see how your comments contribute to the solution.

-Harold
HESmith is online now   Reply With Quote
Old 11-21-2011, 11:46 AM   #25
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default

Hi Harold,
Despite the simplicity of what we are discussing here, I think there are ambiguities. I agree your interpretation is likely the correct one. But randomly choosing 5 bases once and prefixing all the reads in a lane with that random sequence would lead to failure of the cluster calling software. That is all I meant. That might seem ludicrous, but I have seen experiments fail for misunderstandings just as ludicrous.

But, yeah, that might be sufficiently unlikely that my bringing up was just distracting, not illuminating. (Also, it could be the malign influence of xkcd forcing my hand to create that link back to it...)

--
Phillip
pmiguel is offline   Reply With Quote
Old 11-22-2011, 11:43 AM   #26
HMorrison
Senior Member
 
Location: Massachusetts

Join Date: May 2009
Posts: 116
Default

Quote:
Originally Posted by pmiguel View Post
Semantically "random" does not mean "high complexity". See.

Point being we presume "random" here means all four bases were a random mix of ACGT and therefore an even mixture of all 256 possible sequences. But we would need to know what the method of generating these was to know.

--
Phillip
Harold and Phillip,
I am the poster ("she", not "he", btw) who used the phrase "First four bases were completely random". The 4 random bases were generated by ordering my oligos with "NNNN" where the read is supposed to begin. I did not generate a single "random" sequence to use. Back when I used to synthesize oligos myself, we achieved randomness by mixing reagents into a single bottle that went on the instrument along with A, C, G, T. Don't know what InVitrogen or IDT do these days. (Anybody else go back to Maxam&Gilbert sequencing days, pre-PCR ?)

To update, it appears that I got a reasonable number of reads surviving up until the HiSeq lost focus partway through read 3. I'll try the lower cluster density and phiX or shotgun library spike in next time.

Thanks, all.

Hilary
HMorrison is offline   Reply With Quote
Old 11-22-2011, 12:15 PM   #27
HESmith
Senior Member
 
Location: Bethesda MD

Join Date: Oct 2009
Posts: 505
Default

Hi Hilary,

Apologies for using the incorrect gender, and sorry to hear about the focusing error. Better luck next time.

Harold
HESmith is online now   Reply With Quote
Old 12-01-2011, 06:24 AM   #28
TonyBrooks
Senior Member
 
Location: London

Join Date: Jun 2009
Posts: 298
Default

Would running low diversity libraries at a low concentration not help solve the problem?
If you are not looking for large number of reads, then running at a low concentration should mean less chance of overlapping clusters and more reads passing filter.
TonyBrooks is offline   Reply With Quote
Old 12-01-2011, 07:32 AM   #29
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default

Yes it helps. But from my limited experience, the catastrophic failures come from focusing issues -- where the instrument sees a blank flow cell surface and de-focuses as it attempts to "find" the clusters it expects.

Again, recent firmware upgrades may have mitigated this particular issue. I am particularly paranoid about it because we only recently got an Illumina sequencer and our particular model is an outlier. So problems probably get solved for the HiSeqs first -- those particular to a HiScanSQ would be noticed and fixed later in most cases.

--
Phillip
pmiguel is offline   Reply With Quote
Old 12-01-2011, 11:55 AM   #30
csquared
Member
 
Location: Huntsville, AL

Join Date: May 2008
Posts: 67
Default

Quote:
Originally Posted by HMorrison View Post
Harold and Phillip,
To update, it appears that I got a reasonable number of reads surviving up until the HiSeq lost focus partway through read 3. I'll try the lower cluster density and phiX or shotgun library spike in next time.

Thanks, all.

Hilary
The loss of focus during read 3 is likely a bubble from fluidics than a diversity problem. If you got that far with good PF clusters, good base quality and a good, flat FWHM metric, it isn't the diversity of the library that is the problem. It is very likely a fluidics issue and one you should raise with your FAS as it would potentially be eligible for a warranty replacement of the affected lane.

It is also worth checking if your HiSeq (assuming it is a HiSeq instrument, if not, this may not apply) has had the new solenoid valves installed. They help prevent, but not eliminate, the bubble issues. I don't know what Illumina is calling the new valves but your FSE or FAS will know.
__________________
HudsonAlpha Institute for Biotechnology
http://www.hudsonalpha.org/gsl
csquared is offline   Reply With Quote
Old 12-02-2011, 08:49 AM   #31
HMorrison
Senior Member
 
Location: Massachusetts

Join Date: May 2009
Posts: 116
Default

Update--
Quality on first read was low and got an "N" in up to half the reads at positions 12, 20, 21, and 24. Indexing read was fine. The instrument completely lost focus on 2nd read (100 nt read) so I got nothing. That read started reading into the specific amplicon reverse primer which I thought would be okay because clusters had been found on the first read. Apparently not true. Doesn't matter for this library but eventually want to join longer reads. Am trying again next run with 20% spike in of high complexity library and 50% lower cluster density.

We've had the solenoid change-out and the s/w upgrades (just before this run) and bubble-related loss of focus should not be entire lane starting with cycle 1 (or so Illumina claims). This was 100% loss, not a few percent.
Attached Images
File Type: png Lane2.png (114.8 KB, 46 views)

Last edited by HMorrison; 12-02-2011 at 09:23 AM.
HMorrison is offline   Reply With Quote
Old 12-02-2011, 11:14 AM   #32
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default

Quote:
Originally Posted by HMorrison View Post
Update--
Quality on first read was low and got an "N" in up to half the reads at positions 12, 20, 21, and 24. Indexing read was fine. The instrument completely lost focus on 2nd read (100 nt read) so I got nothing. That read started reading into the specific amplicon reverse primer which I thought would be okay because clusters had been found on the first read. Apparently not true. Doesn't matter for this library but eventually want to join longer reads. Am trying again next run with 20% spike in of high complexity library and 50% lower cluster density.
Actually you likely got burned by one of the recent updates. Until the v3 chemistry update, the instrument did not refocus after strand turnaround. But now it does. If the instrument had used the focal points it was using during the successful index read, you might have been okay.

Thanks for the info, though.

If this happens again, Tech Support may be able to guide you through a forced refocus of the instrument after the cycle where your adapter ends. (I think it mainly would involve halting the run and re-starting it.)

Hmm, if that works, it could be a general (although drastic) solution to low complexity bases even in the first read. Just let as many cycles as necessary complete (ignoring the possibly terrible results you are getting). Stop the run. Start a completely new run on the same flow cell, from that point. At 4 bases from the end your low complexity start, cluster calling would happen.

--
Phillip
pmiguel is offline   Reply With Quote
Old 01-27-2012, 01:11 PM   #33
MRSeq
Member
 
Location: USA

Join Date: May 2009
Posts: 11
Default

Hello,

I am glad to see a thread with so many helpful informations for us dealing with low complexity samples! I am having the same problems - 5-11 nt label at the beginning of the read. So far the quality has been very disappointing and it worsened dramatically when we switched to HiSeQ in spite of 50% phiX spike-in and lower cluster densities. We were also not very happy with our sequencing facility for other reasons. Those of you who had good experience with sequencing low diversity samples - can you please share which sequencing facilities seem to do a good job of it? In our case some preparation is very time consuming and costly and we do not want to loose any more data...
MRSeq is offline   Reply With Quote
Old 01-27-2012, 01:27 PM   #34
HMorrison
Senior Member
 
Location: Massachusetts

Join Date: May 2009
Posts: 116
Default Mixed success

Still haven't gotten this to work well--instrument is being checked out right now by Illumina engineers. They say there always was refocusing on read two (really 3; after indexing read) so I needed randomness at first few bases there, too. PhiX spike in didn't help much. Think I need to redesign primers to stagger the signal.
HMorrison is offline   Reply With Quote
Old 01-27-2012, 01:34 PM   #35
MRSeq
Member
 
Location: USA

Join Date: May 2009
Posts: 11
Default

Quote:
Originally Posted by HMorrison View Post
Still haven't gotten this to work well--instrument is being checked out right now by Illumina engineers. They say there always was refocusing on read two (really 3; after indexing read) so I needed randomness at first few bases there, too. PhiX spike in didn't help much. Think I need to redesign primers to stagger the signal.
Hillary,

we have had no success so far, either :-( So your 20% spike-in phiX run with 50% normal cluster density was not a success, either? What specific cluster density did you used? Was it chemistry v3? We will be doing another attempt on HiSeQ with v3 chemistry in a couple of weeks so I am very interested in the technical details. TIA
MRSeq is offline   Reply With Quote
Old 09-25-2012, 03:05 AM   #36
josdegraaf
Member
 
Location: Germany

Join Date: Mar 2010
Posts: 33
Default

[QUOTE=pmiguel;57185]We got it to work on our HiScanSQ -- which uses the same chemistry as the HiSeq, but only scans the top of the flowcell. Not an identical situation, but we had some SMART cDNAs that we sheared and ligated TruSeq adapters on. So about 1/2 of them had the same 50 nt of SMART primer at the beginning. We mixed them 1:1 with a genomic DNA library. Cluster registration went fine.

--


We are also considering this, may I ask you why half of the reads had the smart oligo at the beginning (since you radomly sheared them)

Thanks and best
josdegraaf is offline   Reply With Quote
Old 09-25-2012, 07:57 AM   #37
JackieBadger
Senior Member
 
Location: Halifax, Nova Scotia

Join Date: Mar 2009
Posts: 381
Default

we sequenced low complexity/diversity libraries by mixing them in with other experiments of high complexity. this way low complexity clustered can be indentifed in amongst high complexity clusters
JackieBadger is offline   Reply With Quote
Old 09-26-2012, 12:31 AM   #38
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

If you only have a strong initial bias but with diverse sequence afterwards there is a somewhat dramatic solution which does work on the Hi-Seq, but unfortunately has to be applied to the whole run.

Basically you start read 1 as normal, but before starting to do imaging you do 5 chemistry cycles without doing any imaging (so called 'dark cycles'). You then do the run as normal and read through the rest of your sequence. At the end of the first read you strip the templates and then re-anneal primer 1 and do a 5 cycle read 2 to get the biased bases at the start. You therefore end up with all of the sequence you want, but spread out over 2 reads instead of 1.

Having said that, in your case that's not what I'd do. Given that you're only using the first 5bp to confirm that you have the correct sequence I'd be tempted to use a custom primer to prime over the biased positions though. If you have a different sequence at the start the primer shouldn't anneal properly so you should only get signal from clusters with the correct sequence.
simonandrews is offline   Reply With Quote
Old 09-26-2012, 06:01 AM   #39
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default

Quote:
Originally Posted by josdegraaf View Post
Quote:
Originally Posted by pmiguel View Post
We got it to work on our HiScanSQ -- which uses the same chemistry as the HiSeq, but only scans the top of the flowcell. Not an identical situation, but we had some SMART cDNAs that we sheared and ligated TruSeq adapters on. So about 1/2 of them had the same 50 nt of SMART primer at the beginning. We mixed them 1:1 with a genomic DNA library. Cluster registration went fine.
We are also considering this, may I ask you why half of the reads had the smart oligo at the beginning (since you radomly sheared them)

Thanks and best
Will have to recall from long term memory.

Okay, the cDNAs all were flanked by these 50 bp SMART adapters. These cDNAs were also fairly short prior to shearing for reasons I won't go into. They were fragmented to roughly 1/2 their initial length. Then "end polished" and ligated to Illumina adapters, probably using the TruSeq DNA prep kit. Hence, random chance put 50% of the reads adjacent to the SMART adapter.

SMART is designed to amplify full-length cDNAs using the ingenious "PCR Suppression" method. So if your post-shearing length is a small enough fraction of the amplified cDNA length, it should not be much of an issue. Possible to further reduce contribution towards your sequence from adapter using a rare-cutting restriction enzyme site in your SMART adapter. Not sure it would be worth it, though.

--
Phillip
pmiguel is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:28 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO