Seqanswers Leaderboard Ad

**swNGS** · 09-12-2012, 02:07 PM

That's a really unhelpful sequencing core that you have there....
You might find you get a better response if you are more specific about what you are trying to do...

**swbarnes2** · 09-12-2012, 03:30 PM

There are programs where you can feed them SNP data, and they will at least tell you what amino acid changes the make.

Off the top of my head, there's some ensembl variant predictor, a program called SNPeff, and a program called annovar. I use annovar on mouse SNPs, seems to work fine.

**tahamasoodi** · 09-12-2012, 10:24 PM

Hi swbarnes2,
Thanks for your response, I tried SNPeff but it is accepting SVF format input files while my data is in xlsx file. When I tries annovar, it shows me the error message when I give any command starting with annovar.pl, I get the error message command not found. I tried many things but failed.

**ulz_peter** · 09-12-2012, 10:44 PM

Are these SNPs annotated in any way (e.g.: Allele frequencies in 1000genomes project, Exome sequencing project, Prediction values of SIFT, Conservation Score, AminoAcid Change, gene affected)?
IF yes, then that's something to start with.
Filter out all common variants
If there's a special region you interested in, take out only those SNPs,

If not, get a annotation program running (I recommend annovar as well, but it needs a certain format of your input file, but since it is text-based you should be able to create that from the Excel file)

If you can't get it done, you also might have a look here:

http://snp.gs.washington.edu/SeattleSeqAnnotation/

Hope that helps

**tahamasoodi** · 09-13-2012, 02:11 AM

Thanks Peter,

The excel file contains a number of fields as given below. I want to know the significant SNPs in the whole genome. Can I do it in excel itself or I have to use any tool for it? I tried to use annovar but i m getting an error in it.

Regards,

#chr_name chr_start chr_end ref_base alt_base hom_het snp_quality tot_depth
chr10 61373 61373 A - hom 189 28
chr10 62082 62082 G T het 52 33
chr10 65878 65878 C G hom 31 3

alt_depth region gene
28 intergenic NONE(dist=NONE),TUBB8(dist=31455)
11 intergenic NONE(dist=NONE),TUBB8(dist=30746)
3 intergenic NONE(dist=NONE),TUBB8(dist=26950)

dbSNP135_full dbSNP135_common 1000G_2011Oct_allele_freq
rs9329307 . .
rs2271275 rs2271275 0.55
rs6901 rs6901 0.73

annotation
TUBB8:NM_177987:exon4:c.A314G

.H105R,
ADARB2:NM_018702:exon9:c.G1876A

.A626T,
PITRM1:NM_001242307:exon27:c.A3113G

.Q1038R,PITRM1:NM_014889:exon27:c.A3110G

.Q1037R,PITRM1:NM_001242309:exon24:c.A2816G

.Q939R,

**ulz_peter** · 09-13-2012, 02:18 AM

what do you mean by significant SNPs?

It seems that your SNPs are already annotated.
So, in case you search for the cause of a rare disease you could limit yourself to SNPs having an allele frequence < 0.01 in 1000G_2011Oct_allele_freq and have no entry i n the dbSNP135_common fields and variants that are possibly deleterious (in your case it is stated in the annotation part, e.g.

.H105R)

You could do that in Excel, but again,
if you do not specify your problem we cannot specify the solution

**tahamasoodi** · 09-13-2012, 02:51 AM

Actually, I have around 80 samples of CRC patients and equal controls of whole genome and I got around 3,768,494 SNPs, 10,557 nsSNPS, 535,826 indels, 474 coding indels for one case sample and almost a similar figure for the controls. Now I want to know which SNPs/indels are responsible for the disease by filtering these huge number of SNPs. How can i give the filtering criteria? Can you give a full description of the annotations field?

**xied75** · 09-13-2012, 03:12 AM

I was just guessing that he might be feeding whatever programs you have mentioned with the excel file directly, other than creating new text files in a format that these programs can read. (But if I'm wrong, then ignore this.)

Best,

dong

**ulz_peter** · 09-13-2012, 03:26 AM

So you've got 160 Excel files each having about 4million entries?

I guess you'll need some programming here...
I don't know of any program which could compute significance of certain SNPs when they show up in a significant portion of samples. Maybe someone else can help here...

What you might do is filtering out the synonymous SNPs and SNPs showing higher allele frequencies just by using an Excel filter, but for 160 huge Excel files that may not be what you want.

Since I am in a good mood today I'm gonna explain you the flags:

chr_name: Name of the chromosome
chr_start: SNP position (starting point for in/dels)
chr_end : SNP position (end point for indels)
ref_base: human reference at that exact position
alt_base : base detected in your sample at that position
hom_het : whether the mutation showed up homozygus or heterozygous
snp_quality: a quality value of how likely it is, that your SNP is real or just a sequencing artifact (no idea about the scale they use for assigning the SNP quality value)
tot_depth: Sequencing depth at that position (i.e.: how many reads cover this position)
alt_depth: sequencing reads at that position that show the mutated allele
region: Obviously shows if that mutation lies within a gene/exon/intron or elsewhere
gene: gene affected
dbSNP135_full: dbSNP version 135 reference
dbSNP135_common: dbSNP version 135 reference in case that SNP has an allele frequency >1%
1000G_2011Oct_allele_freq: Allele frequency determined by the 1000Genomes (October 2011 version) project
annotation: nomenclature for the mutation- c.XXX is the cDNA position of the NM_xxx isoform and p.xxx is the protein substitution nomenclature for that mutation

Since I did not create the files I cannot guarantee that this is absolutely true, but these are the most likely explanations.

Best regards,
Peter

**ulz_peter** · 09-13-2012, 03:27 AM

Originally posted by xied75 View Post

I was just guessing that he might be feeding whatever programs you have mentioned with the excel file directly, other than creating new text files in a format that these programs can read. (But if I'm wrong, then ignore this.)

Best,

dong

That's what I am guessing too, however his files seem to be annotated already...

**tahamasoodi** · 09-13-2012, 03:40 AM

If I select the particular genes involved in CRC, I think then excel filter can help in screening the deleterious SNPs.

**swbarnes2** · 09-13-2012, 08:23 AM

There is no perfect algorithm that goes from primary amino acid change -> functional effect. So you'll want to use a combo of programs ike polyPhen-2, pathway analysis, comparison to the 1K Genomes SNP set, stuff like that.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 20 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Illumina final result analysis

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News