![]() |
|
|
#1 |
|
--Site Admin--
Join Date: Oct 2007
Location: SF Bay Area, CA, USA
Posts: 491
|
A new paper describing an improved solexa aligner / SNP caller just came out. Looks interesting.
***************************** Slider - Maximum use of probability information for alignment of short sequence reads and SNP detection. Malhis N, Butterfield Y, Ester M, Jones SJ. Genome Sciences Centre, BC Cancer Agency, Vancouver, BC, Canada. MOTIVATION: A plethora of alignment tools have been created that are designed to best fit different types of alignment conditions. While some of these are made for aligning Illumina Sequence Analyzer reads, none of these are fully utilizing its probability (prb) output. In this paper, we will introduce a new alignment approach (Slider) that reduces the alignment problem space by utilizing each read base's probabilities given in the prb files. RESULTS: Compared with other aligners, Slider has higher alignment accuracy and efficiency. In addition, given that Slider matches bases with probabilities other than the most probable, it significantly reduces the percentage of base mismatches. The result is that its SNP predictions are more accurate than other SNP prediction approaches used today that start from the most probable sequence, including those using base quality. CONTACT: nmalhis *(<AT>)*bcgsc.ca Supplementary information and availability: http://www.bcgsc.ca/platform/bioinfo/software/slider. |
|
|
|
|
|
#2 |
|
Senior Member
Join Date: Jan 2008
Location: USA
Posts: 290
|
Looks interesting.. using .prb instead of the fastq. There are tools that optionally take .prb files as input, but I am not sure if they use probability information for each base!
|
|
|
|
|
|
#3 |
|
Member
Join Date: Nov 2008
Location: Vancouver, Canada
Posts: 10
|
This release of Slider was prepared for the Oxford Bioinformatics paper reviewers as a proof of concept:
http://bioinformatics.oxfordjournals...urcetype=HWCIT I’m working now on a beta release with much improvements and capabilities. This new release should be ready by the end of this month (Nov. 2008). Nawar Malhis Last edited by nmalhis; 11-05-2008 at 09:15 AM. |
|
|
|
|
|
#4 |
|
Member
Join Date: Nov 2008
Location: Vancouver, Canada
Posts: 10
|
SliderII: High Quality SNP Calling Using Illumina Data at Shallow Coverage:
is now available from: http://www.bcgsc.ca/platform/bioinfo/software/SliderII Sorry for the delay, Nawar |
|
|
|
|
|
#5 |
|
Member
Join Date: Jan 2009
Location: HSPH, Boston
Posts: 18
|
Also going to follow up via email, but just in case: Illumina seems to be moving towards a change in the .prb files; the new workflow does not seem to produce the four-channel probabilities anymore.
Is there a workaround? This would also affect other probabilistic aligners. -- Oliver |
|
|
|
|
|
#6 |
|
Senior Member
Join Date: May 2008
Location: USA, Midwest
Posts: 158
|
Oliver,
You can rerun the base calling, starting the pipeline with Bustard using the intensity files generated by RTA. Bustard will accept as optional arguments --with-seq, --with-qval, --with-sig2 and --with-prb which will instruct Bustard to generate these legacy files. You can also add these arguments to the goat.py command line if you are restarting the pipeline from the image analysis step. |
|
|
|
|
|
#7 |
|
Member
Join Date: Jan 2009
Location: HSPH, Boston
Posts: 18
|
Glad to hear, thanks for the information! Going to report back on how SliderII handles very deep sequence coverage soon-ish.
-- Oliver |
|
|
|
|
|
#8 |
|
Member
Join Date: Mar 2008
Location: KL
Posts: 42
|
Hi,
Novoalign will take prb format read files. It will use prb values as probabilities both when generating seeds and in calculating penalties for the Needleman-Wunsch alignment. This usually gives more alignments than running off the fastq files but has been criticised by some as the Illumina fastq files have been quality calibrated but the prb files are not. I have never seen any test comparing SNP calls with Genotype that would show whether using prb files improves SNP calls. Colin |
|
|
|
|
|
#9 | |
|
Member
Join Date: Apr 2008
Location: Australia
Posts: 96
|
Wouldnt it be better in the long run to use calibrated base calls rather than second-guessing with the PRB base calls?
The 1000 genomes project recalibrated their FASTQ files using prior alignment information to improve the data quality. Quote:
|
|
|
|
|
|
|
#10 |
|
Member
Join Date: Jan 2009
Location: HSPH, Boston
Posts: 18
|
Colin, good meeting you at ISMB! Should have some comparative data for FASTQ vs PRB files soon. Zee, tend to agree, but we are looking at data with 2+ SNPs per read on average, and in many cases at high frequency, and from more than two clones. Was hoping that in these cases the underlying PRB data might be informative.
|
|
|
|
|
|
#11 |
|
Member
Join Date: Nov 2008
Location: Vancouver, Canada
Posts: 10
|
I’d like to add that Slider II calibrate prb data before calling SNPs.
Regarding the storage space of prb files, since these files contain reparative data, compressing these files to .gz while reduce the size by 7 to 10 times. Slider II reads .gz files. When we have more than 2 SNPs in a read, Slider II, like other SNPs calling tools, filter dense SNPs so results might not be good. Nawar |
|
|
|
|
|
#12 |
|
Member
Join Date: Jan 2009
Location: HSPH, Boston
Posts: 18
|
Yep, that's going to be a problem no matter what tool we use -- four to five SNPs per read on average. Having said that, as we are only aligning against 10kb of reference sequence most reads should still be align-able. Now, if we could stop the genomic center from deleting the intensity and PRB files after each run...
|
|
|
|
|
|
#13 |
|
Member
Join Date: Nov 2008
Location: Vancouver, Canada
Posts: 10
|
"four to five SNPs per read on average" and "10kb of reference sequence ", This is about 10% of the reference is unknown, I would assemble these reads since the reference is short enough not to have a repeat issues.
|
|
|
|
|
|
#14 |
|
Member
Join Date: Jan 2009
Location: HSPH, Boston
Posts: 18
|
Interesting. Hadn't even thought about reference-based or de novo assemblies as an alternative. Will keep it in mind, thanks again!
|
|
|
|
![]() |
| Tags |
| aligner, illumina, snp, solexa |
| Thread Tools | |
|
|