SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
MapView: a viewer for short reads alignment baohua100 Bioinformatics 99 08-23-2013 08:12 AM
Bowtie: Ultrafast and memory-efficient alignment of short reads to the human genome Ben Langmead Literature Watch 2 03-04-2013 02:06 AM
Short sequence alignment rboettcher Bioinformatics 9 03-17-2011 02:35 AM
PubMed: De novo assembly of short sequence reads. Newsbot! Literature Watch 0 08-21-2010 02:01 AM
Haman genome alignment with short reads ptongyoo Bioinformatics 4 04-14-2009 05:27 PM

Reply
 
Thread Tools
Old 11-02-2008, 06:35 AM   #1
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,358
Default Slider - Maximum use of probability information for alignment of short sequence reads

A new paper describing an improved solexa aligner / SNP caller just came out. Looks interesting.

*****************************

Slider - Maximum use of probability information for alignment of short sequence reads and SNP detection.


Malhis N, Butterfield Y, Ester M, Jones SJ.

Genome Sciences Centre, BC Cancer Agency, Vancouver, BC, Canada.

MOTIVATION: A plethora of alignment tools have been created that are designed to best fit different types of alignment conditions. While some of these are made for aligning Illumina Sequence Analyzer reads, none of these are fully utilizing its probability (prb) output. In this paper, we will introduce a new alignment approach (Slider) that reduces the alignment problem space by utilizing each read base's probabilities given in the prb files. RESULTS: Compared with other aligners, Slider has higher alignment accuracy and efficiency. In addition, given that Slider matches bases with probabilities other than the most probable, it significantly reduces the percentage of base mismatches. The result is that its SNP predictions are more accurate than other SNP prediction approaches used today that start from the most probable sequence, including those using base quality. CONTACT: nmalhis *(<AT>)*bcgsc.ca Supplementary information and availability: http://www.bcgsc.ca/platform/bioinfo/software/slider.
ECO is offline   Reply With Quote
Old 11-03-2008, 07:13 AM   #2
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

Looks interesting.. using .prb instead of the fastq. There are tools that optionally take .prb files as input, but I am not sure if they use probability information for each base!
bioinfosm is offline   Reply With Quote
Old 11-05-2008, 07:54 AM   #3
nmalhis
Member
 
Location: Vancouver, Canada

Join Date: Nov 2008
Posts: 11
Default from the author

This release of Slider was prepared for the Oxford Bioinformatics paper reviewers as a proof of concept:
http://bioinformatics.oxfordjournals...urcetype=HWCIT

I’m working now on a beta release with much improvements and capabilities. This new release should be ready by the end of this month (Nov. 2008).

Nawar Malhis

Last edited by nmalhis; 11-05-2008 at 08:15 AM.
nmalhis is offline   Reply With Quote
Old 03-30-2009, 03:34 PM   #4
nmalhis
Member
 
Location: Vancouver, Canada

Join Date: Nov 2008
Posts: 11
Default

SliderII: High Quality SNP Calling Using Illumina Data at Shallow Coverage:

is now available from:

http://www.bcgsc.ca/platform/bioinfo/software/SliderII

Sorry for the delay,

Nawar
nmalhis is offline   Reply With Quote
Old 07-19-2009, 02:47 PM   #5
ohofmann
Member
 
Location: Melbourne, Australia

Join Date: Jan 2009
Posts: 37
Default

Also going to follow up via email, but just in case: Illumina seems to be moving towards a change in the .prb files; the new workflow does not seem to produce the four-channel probabilities anymore.

Is there a workaround? This would also affect other probabilistic aligners.

-- Oliver
ohofmann is offline   Reply With Quote
Old 07-20-2009, 05:49 AM   #6
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,178
Default

Oliver,

You can rerun the base calling, starting the pipeline with Bustard using the intensity files generated by RTA. Bustard will accept as optional arguments --with-seq, --with-qval, --with-sig2 and --with-prb which will instruct Bustard to generate these legacy files. You can also add these arguments to the goat.py command line if you are restarting the pipeline from the image analysis step.
kmcarr is offline   Reply With Quote
Old 07-20-2009, 09:10 AM   #7
ohofmann
Member
 
Location: Melbourne, Australia

Join Date: Jan 2009
Posts: 37
Default

Glad to hear, thanks for the information! Going to report back on how SliderII handles very deep sequence coverage soon-ish.

-- Oliver
ohofmann is offline   Reply With Quote
Old 07-21-2009, 01:49 AM   #8
sparks
Senior Member
 
Location: Kuala Lumpur, Malaysia

Join Date: Mar 2008
Posts: 126
Default

Hi,

Novoalign will take prb format read files. It will use prb values as probabilities both when generating seeds and in calculating penalties for the Needleman-Wunsch alignment. This usually gives more alignments than running off the fastq files but has been criticised by some as the Illumina fastq files have been quality calibrated but the prb files are not. I have never seen any test comparing SNP calls with Genotype that would show whether using prb files improves SNP calls.

Colin
sparks is offline   Reply With Quote
Old 07-21-2009, 02:00 AM   #9
zee
NGS specialist
 
Location: Malaysia

Join Date: Apr 2008
Posts: 249
Default

Wouldnt it be better in the long run to use calibrated base calls rather than second-guessing with the PRB base calls?
The 1000 genomes project recalibrated their FASTQ files using prior alignment information to improve the data quality.


Quote:
Originally Posted by sparks View Post
gives more alignments than running off the fastq files but has been criticised by some as the Illumina fastq files have been quality calibrated but the prb files are not. I have never seen any test comparing SNP calls with Genotype that would show whether using prb files improves SNP calls.
Colin
zee is offline   Reply With Quote
Old 07-21-2009, 02:53 AM   #10
ohofmann
Member
 
Location: Melbourne, Australia

Join Date: Jan 2009
Posts: 37
Default

Colin, good meeting you at ISMB! Should have some comparative data for FASTQ vs PRB files soon. Zee, tend to agree, but we are looking at data with 2+ SNPs per read on average, and in many cases at high frequency, and from more than two clones. Was hoping that in these cases the underlying PRB data might be informative.
ohofmann is offline   Reply With Quote
Old 07-23-2009, 03:00 PM   #11
nmalhis
Member
 
Location: Vancouver, Canada

Join Date: Nov 2008
Posts: 11
Default

Id like to add that Slider II calibrate prb data before calling SNPs.
Regarding the storage space of prb files, since these files contain reparative data, compressing these files to .gz while reduce the size by 7 to 10 times. Slider II reads .gz files.
When we have more than 2 SNPs in a read, Slider II, like other SNPs calling tools, filter dense SNPs so results might not be good.

Nawar
nmalhis is offline   Reply With Quote
Old 07-23-2009, 05:40 PM   #12
ohofmann
Member
 
Location: Melbourne, Australia

Join Date: Jan 2009
Posts: 37
Default

Quote:
Originally Posted by nmalhis View Post
When we have more than 2 SNPs in a read, Slider II, like other SNPs calling tools, filter dense SNPs so results might not be good.

Nawar
Yep, that's going to be a problem no matter what tool we use -- four to five SNPs per read on average. Having said that, as we are only aligning against 10kb of reference sequence most reads should still be align-able. Now, if we could stop the genomic center from deleting the intensity and PRB files after each run...
ohofmann is offline   Reply With Quote
Old 07-24-2009, 12:21 PM   #13
nmalhis
Member
 
Location: Vancouver, Canada

Join Date: Nov 2008
Posts: 11
Default

"four to five SNPs per read on average" and "10kb of reference sequence ", This is about 10% of the reference is unknown, I would assemble these reads since the reference is short enough not to have a repeat issues.
nmalhis is offline   Reply With Quote
Old 07-24-2009, 05:33 PM   #14
ohofmann
Member
 
Location: Melbourne, Australia

Join Date: Jan 2009
Posts: 37
Default

Interesting. Hadn't even thought about reference-based or de novo assemblies as an alternative. Will keep it in mind, thanks again!
ohofmann is offline   Reply With Quote
Old 09-21-2010, 06:03 AM   #15
korifuenc7933
Junior Member
 
Location: usa

Join Date: Sep 2010
Posts: 1
Default

Very usefully... I heard about using .prb instead of the fastq. Now working on it.
korifuenc7933 is offline   Reply With Quote
Old 09-21-2010, 06:09 AM   #16
sparks
Senior Member
 
Location: Kuala Lumpur, Malaysia

Join Date: Mar 2008
Posts: 126
Default

Hi korifuenc7933,

If you'd like to align using prb files Novoalign supports this. It usually improves alignment yield vs fastq files.
Colin
sparks is offline   Reply With Quote
Old 09-21-2010, 11:41 AM   #17
Menato
Junior Member
 
Location: Russa

Join Date: Sep 2010
Posts: 1
Default

I heard about .prb. Nice information, i guess i should use it instead oj the fastq
Menato is offline   Reply With Quote
Old 09-21-2010, 04:35 PM   #18
sparks
Senior Member
 
Location: Kuala Lumpur, Malaysia

Join Date: Mar 2008
Posts: 126
Default

One issue with prb files is that the prb values aren't calibrated against mismatch rates. I haven't seen any study to see if this is really a problem and how it might impact alignment and SNV calling.
It would be interesting if someone could look at concordance of SNP/Indel calls for an RNA dataset using prb, fastq & calibrated fastq. Another interesting project would be to look at calibrating prb probabilities.
sparks is offline   Reply With Quote
Reply

Tags
aligner, illumina, snp, solexa

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:55 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO