Seqanswers Leaderboard Ad

**bioinfosm** · 11-03-2008, 08:13 AM

Looks interesting.. using .prb instead of the fastq. There are tools that optionally take .prb files as input, but I am not sure if they use probability information for each base!

**nmalhis** · 11-05-2008, 08:54 AM

from the author

This release of Slider was prepared for the Oxford Bioinformatics paper reviewers as a proof of concept:

http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btn565v1?maxtoshow=&HITS=10&hits=10&RESULTFORMAT=&fulltext=butterfield&searchid=1&FIRSTINDEX=0&resourcetype=HWCIT

I’m working now on a beta release with much improvements and capabilities. This new release should be ready by the end of this month (Nov. 2008).

Nawar Malhis

**nmalhis** · 03-30-2009, 03:34 PM

SliderII: High Quality SNP Calling Using Illumina Data at Shallow Coverage:

is now available from:

SliderII | Genome Sciences Centre

http://www.bcgsc.ca/platform/bioinfo/software/SliderII

High quality SNP calling using Illumina data at minimal coverage

Sorry for the delay,

Nawar

**ohofmann** · 07-19-2009, 02:47 PM

Also going to follow up via email, but just in case: Illumina seems to be moving towards a change in the .prb files; the new workflow does not seem to produce the four-channel probabilities anymore.

Is there a workaround? This would also affect other probabilistic aligners.

-- Oliver

**kmcarr** · 07-20-2009, 05:49 AM

Oliver,

You can rerun the base calling, starting the pipeline with Bustard using the intensity files generated by RTA. Bustard will accept as optional arguments --with-seq, --with-qval, --with-sig2 and --with-prb which will instruct Bustard to generate these legacy files. You can also add these arguments to the goat.py command line if you are restarting the pipeline from the image analysis step.

**ohofmann** · 07-20-2009, 09:10 AM

Glad to hear, thanks for the information! Going to report back on how SliderII handles very deep sequence coverage soon-ish.

-- Oliver

**sparks** · 07-21-2009, 01:49 AM

Hi,

Novoalign will take prb format read files. It will use prb values as probabilities both when generating seeds and in calculating penalties for the Needleman-Wunsch alignment. This usually gives more alignments than running off the fastq files but has been criticised by some as the Illumina fastq files have been quality calibrated but the prb files are not. I have never seen any test comparing SNP calls with Genotype that would show whether using prb files improves SNP calls.

Colin

**zee** · 07-21-2009, 02:00 AM

Wouldnt it be better in the long run to use calibrated base calls rather than second-guessing with the PRB base calls?
The 1000 genomes project recalibrated their FASTQ files using prior alignment information to improve the data quality.

Originally posted by sparks View Post

gives more alignments than running off the fastq files but has been criticised by some as the Illumina fastq files have been quality calibrated but the prb files are not. I have never seen any test comparing SNP calls with Genotype that would show whether using prb files improves SNP calls.
Colin

**ohofmann** · 07-21-2009, 02:53 AM

Colin, good meeting you at ISMB! Should have some comparative data for FASTQ vs PRB files soon. Zee, tend to agree, but we are looking at data with 2+ SNPs per read on average, and in many cases at high frequency, and from more than two clones. Was hoping that in these cases the underlying PRB data might be informative.

**nmalhis** · 07-23-2009, 03:00 PM

I’d like to add that Slider II calibrate prb data before calling SNPs.
Regarding the storage space of prb files, since these files contain reparative data, compressing these files to .gz while reduce the size by 7 to 10 times. Slider II reads .gz files.
When we have more than 2 SNPs in a read, Slider II, like other SNPs calling tools, filter dense SNPs so results might not be good.

Nawar

**ohofmann** · 07-23-2009, 05:40 PM

Originally posted by nmalhis View Post

When we have more than 2 SNPs in a read, Slider II, like other SNPs calling tools, filter dense SNPs so results might not be good.

Nawar

Yep, that's going to be a problem no matter what tool we use -- four to five SNPs per read on average. Having said that, as we are only aligning against 10kb of reference sequence most reads should still be align-able. Now, if we could stop the genomic center from deleting the intensity and PRB files after each run...

**nmalhis** · 07-24-2009, 12:21 PM

"four to five SNPs per read on average" and "10kb of reference sequence ", This is about 10% of the reference is unknown, I would assemble these reads since the reference is short enough not to have a repeat issues.

**ohofmann** · 07-24-2009, 05:33 PM

Interesting. Hadn't even thought about reference-based or de novo assemblies as an alternative. Will keep it in mind, thanks again!

**korifuenc7933** · 09-21-2010, 06:03 AM

Very usefully... I heard about using .prb instead of the fastq. Now working on it.

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 12 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

Slider - Maximum use of probability information for alignment of short sequence reads

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News