Seqanswers Leaderboard Ad

**Awesome** · 12-22-2010, 04:17 PM

To do SNP calling, the standard procedure is to map reads to a reference genome. Then you look at your pileup (i.e. the base frequencies and associated quality scores for every position) and find regions where allele frequencies are least divergent. Illumina's CASAVA uses a fancy nearest-neighbor SNP caller, SOAPsnp uses a bayesian algorithm, and I'm sure there are many, many other methods.

The standard way to SNPcall, because you don't have a reference sequence, is to generate one. You do this by feeding trimmed, high-quality-only reads into a de-novo assembler such as Velvet or ABYSS.

For SNPcalls, contig length isn't really your end goal. Your goal for the assembly should be to have a high percentage of your reads to actually map to your de novo genome.

It is okay if your de novo genome has 1000s of contigs.

If you are dealing with RNA, then mapping partial reads plays a role for a minority of SNPs (close to intron junctions, etc). So you might need to use a Bowtie/Cufflinks, SOAP or whatever to map partially.

Good luck.

**Marius** · 12-23-2010, 12:17 AM

Awesome,
thanks a lot for this straight forward answer. So in your opinion, what I would have to do is:
Take all reads (all individuals, all populations) and sort these only for high quality ones (i.e. Phred >20, no Ns etc.). And then I could take all these reads to create my contigs (I expect around 40'000 contigs). Since I have reads of individuals that belong to quite different populations (which might already have diverged quite a bit, also in the genome), I would have to include all individuals to build these contigs I guess.

There is one aspect I'm not really sure yet. Lets say I have a heterozygote read, which has a SNP somewhere when comparing the different individuals (or even a multiple allele position), i.e.

Read1 (i.e. Ind.2, Pop1): ..AGGGTGGACT...
Read2 (i.e. Ind.4, Pop2): ..AGGGGGGACT..
Read3 (i.e. Ind.1, Pop3): ..AGGGAGGACT..

Let's say all these reads are of high-quality, so the polymorphic site is a true multi-allel SNP position. What would the contig (reference-sequence) look like, which is basically the consensus sequence of these 3 reads I quess? Best would probably be: ..AGGGNGGACT..
And, when I then would do SNPcalling (or consensus calling first for every individual), is this always in relation to this reference-contig or not? Because, I don't want to do SNPcalling relative to the reference, I only need the reference to assure I compare the individual pileups of the same locus among the individuals and populations later on. So the contig-seuqence shouldn't influence my individual consensus/SNP calling!
I.e. I know from SAMtools, that consensus-calling/SNP-calling is only possible relative to the reference sequence...
Which assembler and consensus-calling program would be best for this?

**pierre350d** · 02-07-2011, 01:28 AM

Dear Marius,

At INRIA, France we developped an algorithm, called kisSnp that compares two sets of raw reads. It detects from these sets SNP polymorphism.

We have a public validated Java version here: http://alcovna.genouest.org/kissnp/ and a lighter C version, not yet fully validated but that you could test if you're interested.

Pierre

**vinchenz** · 03-30-2011, 09:59 AM

Ironically, but perhaps not, you might want to to check out a program out of William Cresko's lab called, Stacks.

**pierre350d** · 03-30-2011, 11:23 AM

Thanks for the link.

I take the opportunity of this "up" to inform you that a new version of kisSnp is available: http://alcovna.genouest.org/kissnp-page/

Pierre

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Today, 11:49 AM	0 responses 11 views 0 likes	Last Post by seqadmin Today, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

reference-free SNP discovery

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News