Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Giant alignment, high identity, which model for phylogeny?

    Hello,

    My goal is a phylogeny of multiple isolates, showing me which isolate is closer to which.

    I got an organism from which I did population genomics from a few distant geographic locations. The genome size is about 7-10mb.

    I did denovo assemblies using MIRA, for all of my isolates. I picked the best assembly, concatenated all the contigs, and mapped the reads of the other isolate on top of it to generate a new consensus for each of the other isolates.

    Now, because the species is heterozygous, I picked a cutoff value of 85% when calling basepairs for the consensus. This should get heterozygous loci to be called as an ambiguity. I now took the consensus of all isolates, and aligned it using MAUVE. I trimmed out all sites that had ambiguities, thus removing heterozygous sites.

    I am left with a very long alignment, still about 7-10mb, and only a few thousand sites having any variability whatsoever, spaced out pretty consistently.

    Now for the phylogeny, i picked a simple F model, 100 BS, estimated I and G, phyml.

    Any thoughts on this? It would be really helpful for some advice, what might I have omitted? Is PHYML he best for this kind of analysis, or should I try bayesian, and if so, mr bayes, phylobayes or even beagle? Are there any alternatives to MAUVE?

    Thank you for your help,
    Adrian

  • #2
    I usually do FastTree for a general feeling and then RAxML and PhyloBayes..
    savetherhino.org

    Comment


    • #3
      Since these are all the same species, and just isolates, should I use a strict molecular clock?

      Also, does anyone else have experience with heterozygous (50/50) sites in your reference? Is it a good idea to remove them before trying to reconstruct strain relationship?

      Thanks you!

      Comment


      • #4
        Bump. If anyone has any additional input.

        Comment


        • #5
          Perhaps you could just extract informative sites and use just them like SNPs, since computational burden of analyzing megabases via ML or bayesian inference is tremendous, and most sequence doesn't carry any information anyway.
          Also, the "concatenate contigs (in whatever order and strand orientation they happen to be in assembly) and map reads of other isolates on resulting sequence" part doesn't look really cool. I'm not sure if gene calling and therefore distinguishing neutral vs non-neutral SNPs will be reliable with such and approach. In addition, it throws away all data on real gene order, which can be valuable phylogenetic marker, and imposes a semi-artifactual one.

          PS: what's the point in creating several nearly identical threads? Bump it if nobody answers in a couple of weeks or so.
          Last edited by A_Morozov; 08-27-2013, 10:25 PM.

          Comment


          • #6
            Hey,

            I'd also say you should try do downsize your data to the most informative sites. To infer those maybe a good starting point is to use 'GenomeRing' (GenomeRing). It visualizes differences between genomes in a quite fancy way so you can easliy see at which regions you genomes differ. From there, you could extract the sites which differ in at least say 2 genomes. And infer a phylogeny on only those sites giving you at least an idea whats going on in a phylogentic manner.

            Best phil
            Last edited by sphil; 08-28-2013, 12:25 AM.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 06:37 PM
            0 responses
            10 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 06:07 PM
            0 responses
            9 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            51 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            67 views
            0 likes
            Last Post seqadmin  
            Working...
            X