Hey austic
I almost wanted to make new thread about SNP filtering parameters, but I might as well join this.
I am also using Lasergene package for evaluation.
According to publications there must be around 20-25k SNP in human exome. Not sure about variability in these numbers between individuals, but I suppose thats target.
So I set parameters to:
show All SNPs,
show Coding SNPs only (as I am interested in exome),
Q call 40,
P not ref - 90%,
and SNP percent filter 50-100.
Last one is most arguable because it says to show SNPs that have been seen in at least half (50%) of reads - I think its too stringent, but it is only way that allows me to arrive at 23k SNPs (which supposedly are expected) and gives me transition/transversion (Ti/Tv) ratio of 3.05 (exome is supposed to be 3-3.5, random is 0.5 and genome on average is 2.0-2.1) if I relax any of parameters above, then this "check" fails.
Any thoughts and comments on these parameters?
Now there is toughest part - to narrow down in order to find rare syndrom cause.
I saw in this forum scheme about narrowing pipeline numbers, but there were no explanation how it was done (like 20k->10k->700->120 and then 4-6).
Would be nice if someone would share info how to get from 23k to at least 1000 or less so its possible to look at them manually
Some more points about my data: I am surprised that there is about half of SNPs that are not annotated (isn't that too many NOVEL?), also half are supposed to be amino acid change snps (also too many for what i know about genetics), 169 STOP codon SNPs - OK supposedly possible.
I am thinking of looking at all trio at once in order to catch new SNPs in child - most likely fakes.
Anyone knows how to check SNP conservation in evolution (I mean to check 23k at once) and same for amino acid change SNP (11k at once in polyphen and/or similar soft).
Would be nice to share filtering parameters and numbers what comes out - so its possible to compare.
I almost wanted to make new thread about SNP filtering parameters, but I might as well join this.
I am also using Lasergene package for evaluation.
According to publications there must be around 20-25k SNP in human exome. Not sure about variability in these numbers between individuals, but I suppose thats target.
So I set parameters to:
show All SNPs,
show Coding SNPs only (as I am interested in exome),
Q call 40,
P not ref - 90%,
and SNP percent filter 50-100.
Last one is most arguable because it says to show SNPs that have been seen in at least half (50%) of reads - I think its too stringent, but it is only way that allows me to arrive at 23k SNPs (which supposedly are expected) and gives me transition/transversion (Ti/Tv) ratio of 3.05 (exome is supposed to be 3-3.5, random is 0.5 and genome on average is 2.0-2.1) if I relax any of parameters above, then this "check" fails.
Any thoughts and comments on these parameters?
Now there is toughest part - to narrow down in order to find rare syndrom cause.
I saw in this forum scheme about narrowing pipeline numbers, but there were no explanation how it was done (like 20k->10k->700->120 and then 4-6).
Would be nice if someone would share info how to get from 23k to at least 1000 or less so its possible to look at them manually
Some more points about my data: I am surprised that there is about half of SNPs that are not annotated (isn't that too many NOVEL?), also half are supposed to be amino acid change snps (also too many for what i know about genetics), 169 STOP codon SNPs - OK supposedly possible.
I am thinking of looking at all trio at once in order to catch new SNPs in child - most likely fakes.
Anyone knows how to check SNP conservation in evolution (I mean to check 23k at once) and same for amino acid change SNP (11k at once in polyphen and/or similar soft).
Would be nice to share filtering parameters and numbers what comes out - so its possible to compare.
Comment