View Single Post
Old 12-23-2010, 12:17 AM   #3
Marius
8armed
 
Location: Germany

Join Date: Dec 2010
Posts: 29
Default

Awesome,
thanks a lot for this straight forward answer. So in your opinion, what I would have to do is:
Take all reads (all individuals, all populations) and sort these only for high quality ones (i.e. Phred >20, no Ns etc.). And then I could take all these reads to create my contigs (I expect around 40'000 contigs). Since I have reads of individuals that belong to quite different populations (which might already have diverged quite a bit, also in the genome), I would have to include all individuals to build these contigs I guess.

There is one aspect I'm not really sure yet. Lets say I have a heterozygote read, which has a SNP somewhere when comparing the different individuals (or even a multiple allele position), i.e.

Read1 (i.e. Ind.2, Pop1): ..AGGGTGGACT...
Read2 (i.e. Ind.4, Pop2): ..AGGGGGGACT..
Read3 (i.e. Ind.1, Pop3): ..AGGGAGGACT..

Let's say all these reads are of high-quality, so the polymorphic site is a true multi-allel SNP position. What would the contig (reference-sequence) look like, which is basically the consensus sequence of these 3 reads I quess? Best would probably be: ..AGGGNGGACT..
And, when I then would do SNPcalling (or consensus calling first for every individual), is this always in relation to this reference-contig or not? Because, I don't want to do SNPcalling relative to the reference, I only need the reference to assure I compare the individual pileups of the same locus among the individuals and populations later on. So the contig-seuqence shouldn't influence my individual consensus/SNP calling!
I.e. I know from SAMtools, that consensus-calling/SNP-calling is only possible relative to the reference sequence...
Which assembler and consensus-calling program would be best for this?
Marius is offline   Reply With Quote