SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
SNPdat: Easy annotation of novel and known SNPs from model and non-model organisms d1antho Bioinformatics 0 03-15-2013 08:58 AM
% identity cutoff Chuckytah Metagenomics 0 04-30-2011 05:18 PM
The sequence and de novo assembly of the giant panda genome dan Literature Watch 0 12-21-2009 01:12 AM

Reply
 
Thread Tools
Old 08-07-2013, 08:01 AM   #1
AdrianP
Senior Member
 
Location: Ottawa

Join Date: Apr 2011
Posts: 130
Default Giant alignment, high identity, which model for phylogeny?

Hello,

My goal is a phylogeny of multiple isolates, showing me which isolate is closer to which.

I got an organism from which I did population genomics from a few distant geographic locations. The genome size is about 7-10mb.

I did denovo assemblies using MIRA, for all of my isolates. I picked the best assembly, concatenated all the contigs, and mapped the reads of the other isolate on top of it to generate a new consensus for each of the other isolates.

Now, because the species is heterozygous, I picked a cutoff value of 85% when calling basepairs for the consensus. This should get heterozygous loci to be called as an ambiguity. I now took the consensus of all isolates, and aligned it using MAUVE. I trimmed out all sites that had ambiguities, thus removing heterozygous sites.

I am left with a very long alignment, still about 7-10mb, and only a few thousand sites having any variability whatsoever, spaced out pretty consistently.

Now for the phylogeny, i picked a simple F model, 100 BS, estimated I and G, phyml.

Any thoughts on this? It would be really helpful for some advice, what might I have omitted? Is PHYML he best for this kind of analysis, or should I try bayesian, and if so, mr bayes, phylobayes or even beagle? Are there any alternatives to MAUVE?

Thank you for your help,
Adrian
AdrianP is offline   Reply With Quote
Old 08-07-2013, 08:25 AM   #2
rhinoceros
Senior Member
 
Location: sub-surface moon base

Join Date: Apr 2013
Posts: 372
Default

I usually do FastTree for a general feeling and then RAxML and PhyloBayes..
rhinoceros is offline   Reply With Quote
Old 08-07-2013, 06:00 PM   #3
AdrianP
Senior Member
 
Location: Ottawa

Join Date: Apr 2011
Posts: 130
Default

Since these are all the same species, and just isolates, should I use a strict molecular clock?

Also, does anyone else have experience with heterozygous (50/50) sites in your reference? Is it a good idea to remove them before trying to reconstruct strain relationship?

Thanks you!
AdrianP is offline   Reply With Quote
Old 08-27-2013, 02:14 PM   #4
AdrianP
Senior Member
 
Location: Ottawa

Join Date: Apr 2011
Posts: 130
Default

Bump. If anyone has any additional input.
AdrianP is offline   Reply With Quote
Old 08-27-2013, 10:23 PM   #5
A_Morozov
Member
 
Location: Russia, Irkutsk

Join Date: Feb 2011
Posts: 40
Default

Perhaps you could just extract informative sites and use just them like SNPs, since computational burden of analyzing megabases via ML or bayesian inference is tremendous, and most sequence doesn't carry any information anyway.
Also, the "concatenate contigs (in whatever order and strand orientation they happen to be in assembly) and map reads of other isolates on resulting sequence" part doesn't look really cool. I'm not sure if gene calling and therefore distinguishing neutral vs non-neutral SNPs will be reliable with such and approach. In addition, it throws away all data on real gene order, which can be valuable phylogenetic marker, and imposes a semi-artifactual one.

PS: what's the point in creating several nearly identical threads? Bump it if nobody answers in a couple of weeks or so.

Last edited by A_Morozov; 08-27-2013 at 10:25 PM.
A_Morozov is offline   Reply With Quote
Old 08-28-2013, 12:23 AM   #6
sphil
Senior Member
 
Location: Stuttgart, Germany

Join Date: Apr 2010
Posts: 192
Default

Hey,

I'd also say you should try do downsize your data to the most informative sites. To infer those maybe a good starting point is to use 'GenomeRing' (GenomeRing). It visualizes differences between genomes in a quite fancy way so you can easliy see at which regions you genomes differ. From there, you could extract the sites which differ in at least say 2 genomes. And infer a phylogeny on only those sites giving you at least an idea whats going on in a phylogentic manner.

Best phil

Last edited by sphil; 08-28-2013 at 12:25 AM.
sphil is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:09 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO