Seqanswers Leaderboard Ad

**kmcarr** · 02-07-2009, 12:28 PM

Originally posted by mimi_lupton View Post

Dear all,
If I understand correctly It is filtered by how well it aligns to the human genome, using certain parameters.
Michelle

Michelle,

The filtering is independent of alignment; it is based solely on the relative intensity of the fluorescent signals. There are two methods Illumina uses to calculate relative intensities called Chastity and Purity. Chastity is defined as the ratio of the intensity of the most intense base for a cluster divided by the sum of the most intense plus the second most intense signal. Purity is defined as the ratio of the most intense signal divided by the sum of all four fluorescent signals. The default parameter used by GERALD when filtering reads is CHASTITY ≥ 0.6. Stated another way (after doing a little algebra) the most intense signal must be at least 1.5x higher than the second most intense signal. Also, filter passing is only based on the signals over the first 12 cycles. I am not sure whether this means that the value must be ≥ 0.6 for each of those 12 cycles or that average is ≥ 0.6.

You may have confused the read filtering with quality score calculation. Initial quality scores are based on the observed intensities but the scores may then be calibrated based on the alignment of the control sample to its reference sequence. Reads which do not pass filtering will have lower overall quality scores.

Now given all that, I don't think I'm the one to answer your real question, can you use unfiltered reads to identify rare variants. I do know that MAQ uses the quality score information when calculating its alignment but I don't know if this carries over into their SNP calling algorithm(s). Hopefully someone with more experience in SNP analysis will offer some input.

**mimi_lupton** · 02-08-2009, 11:41 AM

Thanks for your reply, that makes thing clearer.
Because I am looking at pools of lots of individuals I am not using the MAQ SNP calling algorithm, but calling my own SNPs using the pile up function. So the main question I am asking really is whether the non filtered data aligned to the reference is reliable.

Any thoughts would be greatly appreciated.

Thanks
Michelle

**lh3** · 02-08-2009, 02:38 PM

The current filter is quite strong in that it may filter a lot of good data. People are arguing a lot whether/how to use unfiltered data, but I think most of them agree we should at least apply some filters. If you do not want to invent time on studying better filters, I would recommend to use the filter implemented in the pipeline.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 37 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 41 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 35 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

use of export/sequence data

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News