SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
Sequencing a Low diversity library on the HiSeq Simcom Illumina/Solexa 38 09-26-2012 06:01 AM
Loss of data in low-diversity libraries can be recovered by deferred cluster calling fkrueger Bioinformatics 17 01-24-2012 06:29 PM
Help with denaturing low concentration Illumina PE Library TonyBrooks Sample Prep / Library Generation 3 09-27-2010 09:30 AM
Better to risk low concentration library or further amplification? krobison Sample Prep / Library Generation 3 08-12-2010 02:40 PM
Low concentration GA library Melanie Sample Prep / Library Generation 4 08-04-2010 02:34 PM

Reply
 
Thread Tools
Old 06-06-2012, 06:35 PM   #1
mmpillai
Junior Member
 
Location: Denver, CO

Join Date: Apr 2010
Posts: 6
Default Low Diversity library ( 14 Ts) on HiSeq2000

Hello all,
I know that a similar thread was initiated by Simcom with a lot of replies, but I have a slightly different problem. We are sequecing polyA pulled down RNA based libraries to determine alternative poly adenylation. The way the libraries were constructed, the Read 1 on the HiSeq was to start with 6 Ns ( which would serve as unique molecular identifiers or UMIs) followed by 14 Ts and then the transcriptomic sequence. 5 indexed libraries were pooled, and after consultation with several sources including Illumina tech support, it was decided that we will "generate matrix and normalize" to a different lane with a good diversity, and not to spike in PhiX or other controls at a high concentration ( since it will just defeat the purpose of trying to get higher reads in the HiSeq instead of the GAIIx and apparently posed no advantage ) and to load at 8 pM. The library has a mean size of 250 bp. So the initial results from this lane were back and seemed like there were ~ 800K/mm2 clusters and 220 million reads, but with only 30 million that passed the "Chastity filter" calculated from the first 20 bps of or so. The graphics indicate that after 20 bp, mixed signals from all bases are appearing ( after what is clearly a T run). Everyone seems to agree that the low PF rate is likely from the T run, but not sure what to do next with the data - clearly it would be ideal to use more than just the 30 million that passed filter. I was wondering about the following:
1. Any general ideas about what to do about this data-set post-run ? I have seen other discussions about analysing data from low diversity initial bps effectively with non-inhouse algorithms.
2. If this is related to the T run, what is the likely reason for poor quality even though it was normalized to a different ( and succesful lane) ?
3. Are the image files from HiSeq stored as TIFF that could be analyzed with next-phred or some other alternate base caller ?
4. Can "deferred cluster calling" described by Felix Krueger's PLos One paper and what he described in the forum early last year, or something protocol that is similar probably helpful to this scenario ?
Appreciate all help and apologize for naive statements/ assumptions above.
mmpillai is offline   Reply With Quote
Old 06-06-2012, 08:52 PM   #2
HESmith
Senior Member
 
Location: Bethesda MD

Join Date: Oct 2009
Posts: 509
Default

1) If the PHRED scores of the UMI and post-T segments look good, you could rerun the initial scripts from CASAVA to output all of the reads (passed and failed) as FASTQs while masking the T segment and demultiplexing on the UMI.
2) Not sure, but if it's the T segment (as suspected) then the PHRED scores for those cycles are probably much lower than the flanking segments.
3) The images are not saved, so 4) deferred basecalling is not an option at this point.
HESmith is offline   Reply With Quote
Old 06-06-2012, 09:21 PM   #3
mmpillai
Junior Member
 
Location: Denver, CO

Join Date: Apr 2010
Posts: 6
Default

Thank you, we will try those options first. I was under the impression that the PF is calculated on the "chastity scores" of the first 12-20 bases or so, does that directly correlate with the PHRED score for the base or is that a separate metric ?
mmpillai is offline   Reply With Quote
Old 06-07-2012, 03:41 AM   #4
HESmith
Senior Member
 
Location: Bethesda MD

Join Date: Oct 2009
Posts: 509
Default

Chastity differs from PHRED score, and is calculated for the first 25 cycles. Per cycle PHRED scores can be visualized with Illumina's HCS or SAV software.
HESmith is offline   Reply With Quote
Old 06-12-2012, 09:33 PM   #5
mmpillai
Junior Member
 
Location: Denver, CO

Join Date: Apr 2010
Posts: 6
Default

So as an update, illumina and our NGS core both say they cannot rerun the scripts by masking the Ts ( bps 7 to 20). We do have the CIF files saved and I am guessing using a third party base caller would be the next logical step. There seems to be several available, but would there be an advantage of one vs another ( say those with no need for training sets like AYB, naivebayescall or OnlineCall vs IBIS )? And should the Ts try to be masked with these base callers ? I remain optimistic that the dataset is usable given the intensity files "looked good" per the illumina tehnical person himself but almost certainly the base calling is being thrown off by the T stretch.
mmpillai is offline   Reply With Quote
Old 06-13-2012, 01:43 AM   #6
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

If you have a subset of bases which are causing a problem another option is to rerun the bcl conversion specifying --no-eamss. I'd also tell it to export QC filtered sequences as well (don't have the 1.8.1 manual to hand so can't remember the exact option to specify for this). You might find that the qualities of the poly-T stretch are poor, but that they recover once the low complexity sequence is over. Turning off EAMSS will allow the qualities to come back up again and you might return to usable sequence.
simonandrews is offline   Reply With Quote
Old 06-13-2012, 05:36 AM   #7
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,091
Default

Quote:
Originally Posted by simonandrews View Post
If you have a subset of bases which are causing a problem another option is to rerun the bcl conversion specifying --no-eamss. I'd also tell it to export QC filtered sequences as well (don't have the 1.8.1 manual to hand so can't remember the exact option to specify for this).
The option referenced by Simon is "--with-failed-reads" which will include reads failing the filter in the output file.
GenoMax is offline   Reply With Quote
Old 06-13-2012, 06:03 AM   #8
HESmith
Senior Member
 
Location: Bethesda MD

Join Date: Oct 2009
Posts: 509
Default

You can combine the recommendations of Simon and Genomax with the flag --use-bases-mask I6n14Y* to mask the Ts and demultiplex on the first six bases.
HESmith is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:07 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO