PDA

View Full Version : Phylogeny on SNP


sphil
03-20-2012, 03:07 AM
Hello,

I'm searching for a tool inferring phylogenies of different species via snp calls? Any suggestions? I got a table containing different positions and snp calls and want to infer the phylogeny for the different species via those snps. Atm, i really don't know how to tackle the problem except binarize them and cluster via different algorithms.


Thanks,


Phil

brofallon
03-20-2012, 06:34 AM
Most phylogeny estimation tools (phylip, phyml, paup*, MrBayes, *BEAST etc) require their input to be in fasta or phylip format. SNPs alone are tricky for those tools since there's a lot of ignored data (everything in between the SNPs), which makes estimating branch lengths difficult.
Also keep in mind that there might not actually *be* a simple tree underlying your data - recombination and incomplete lineage sorting will make the ancestry of the sequences a potentially complex network, not a simple tree.
With those caveats, I think making a fasta-formatted input file is your best bet.
good luck!

sphil
03-21-2012, 12:34 AM
.
With those caveats, I think making a fasta-formatted input file is your best bet.
good luck!

So it is a possibility to just concat the snp-calls to a complete sequence and do the analysis on that. Gaining a network isn't such a bad thing...

brofallon
03-21-2012, 06:21 AM
If you concat the SNPs, and therefore ignore all invariant sites, you'll probably get approximately the correct tree topology, but branch lengths that are much too long. Some programs may break under these conditions, I'm not entirely sure. I'd be curious to hear what the results look like if you do it...
B

campy
11-30-2012, 02:33 PM
If you concat the SNPs, and therefore ignore all invariant sites, you'll probably get approximately the correct tree topology, but branch lengths that are much too long. Some programs may break under these conditions, I'm not entirely sure. I'd be curious to hear what the results look like if you do it...
B
Does anybody know how to concatenate the SNPs to make the fasta sequence? I have the same problem now.

mm.perrineau
12-19-2012, 03:07 PM
Hi everybody,
Me too i really need and answer !!!!!!!

I have the reads from 4 DNA diploid strains... One genome de reference well annotated...
I made a SNPs calling with CLC and a Venn diagram to represent the similarity and the difference between my 4 strains...

And now I BLOCK !!!!

I would like to make a phylogenetic tree with the SNPs data (not with the number of the SNPs but) with the nucleotide information from the SNPs (INDEL, mutation, rate of mutation).

It should exist on software which code the SNPs on something like a diploid code (AA, A- or --) for each SNPs position... and create a tree with this information !!!

Can you help me please !!!

Thank you

Marie-Mathilde

brofallon
12-19-2012, 03:53 PM
Keep in mind that it's unlikely that there's is a phylogenetic tree that underlies the data. Recombinations are likely to make the trees differ from SNP to SNP, so taking a bunch of SNPs and forcing them into a non-recombining tree may not be that helpful.
You can try ACG (arup.utah.edu/acg) - it can make recombining trees from SNPs from a VCF (or multiple vcfs) and a reference

gsgs
12-19-2012, 04:23 PM
I assume that these programs really only need the
numbers of mutual differences between the
samples. So you should be able to input this
differences-matrix directly.
(better for few samples with long DNA, many differences)

Making a fasta from the vcf is also straightforward,
I just wrote a program for that (SNPs only), handling the chromosomes
separately. You could also merge the chromosomes ...
but that gives long fastas and you'd be back to the differeves-matrix
option

----------edit-------------------------------

just use mtDNA and y- not-recombining-area for maternal and paternal
phylo-trees separately (primates ?)

---------edit------------------------------------

hmm, there should be a program that filters the recombined chunks
and computes the distance in the closely-related areas only

---------edit--------------------------------

take one of the 2 phases/alleles/haplotypes/zygotes at random
(e.g. hapmap has them sorted alphabetically so taking the
first one can give bias)

-------------------------------------

mmmm
08-16-2013, 02:57 AM
have an excel file including snps (mutational and recommbinant). How to extract the mutaional snps only into a new fasta file?

gsgs
08-16-2013, 05:52 AM
save the excel as text-file, post some lines as an example

mm.perrineau
10-30-2013, 11:45 AM
Hello everybody,
I really need to manage to make a phylogenetic tree with my SNP.
Because i am not bio-informaticien i used clcgenomic to "map and call" my SNPs.
Now i have a file which look like:

Chromosome Region Reference Allele Strain
contig_1 145 A G d
contig_1 487 G A a, d, f
contig_1 682 C G b, d
contig_333 1156 T G a
contig_1234 566 C T b
contig_1234 612 C G b, d

So i have 4 strains (a,b,d and f), 1 reference genome with lot of contig.
Can somebody help me?

Thank you very much