Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Giant alignment, high identity, which model for phylogeny?

    Hello,

    My goal is a phylogeny of multiple isolates, showing me which isolate is closer to which.

    I got an organism from which I did population genomics from a few distant geographic locations. The genome size is about 7-10mb.

    I did denovo assemblies using MIRA, for all of my isolates. I picked the best assembly, concatenated all the contigs, and mapped the reads of the other isolate on top of it to generate a new consensus for each of the other isolates.

    Now, because the species is heterozygous, I picked a cutoff value of 85% when calling basepairs for the consensus. This should get heterozygous loci to be called as an ambiguity. I now took the consensus of all isolates, and aligned it using MAUVE. I trimmed out all sites that had ambiguities, thus removing heterozygous sites.

    I am left with a very long alignment, still about 7-10mb, and only a few thousand sites having any variability whatsoever, spaced out pretty consistently.

    Now for the phylogeny, i picked a simple F model, 100 BS, estimated I and G, phyml.

    Any thoughts on this? It would be really helpful for some advice, what might I have omitted? Is PHYML he best for this kind of analysis, or should I try bayesian, and if so, mr bayes, phylobayes or even beagle? Are there any alternatives to MAUVE?

    Thank you for your help,
    Adrian

  • #2
    I usually do FastTree for a general feeling and then RAxML and PhyloBayes..
    savetherhino.org

    Comment


    • #3
      Since these are all the same species, and just isolates, should I use a strict molecular clock?

      Also, does anyone else have experience with heterozygous (50/50) sites in your reference? Is it a good idea to remove them before trying to reconstruct strain relationship?

      Thanks you!

      Comment


      • #4
        Bump. If anyone has any additional input.

        Comment


        • #5
          Perhaps you could just extract informative sites and use just them like SNPs, since computational burden of analyzing megabases via ML or bayesian inference is tremendous, and most sequence doesn't carry any information anyway.
          Also, the "concatenate contigs (in whatever order and strand orientation they happen to be in assembly) and map reads of other isolates on resulting sequence" part doesn't look really cool. I'm not sure if gene calling and therefore distinguishing neutral vs non-neutral SNPs will be reliable with such and approach. In addition, it throws away all data on real gene order, which can be valuable phylogenetic marker, and imposes a semi-artifactual one.

          PS: what's the point in creating several nearly identical threads? Bump it if nobody answers in a couple of weeks or so.
          Last edited by A_Morozov; 08-27-2013, 10:25 PM.

          Comment


          • #6
            Hey,

            I'd also say you should try do downsize your data to the most informative sites. To infer those maybe a good starting point is to use 'GenomeRing' (GenomeRing). It visualizes differences between genomes in a quite fancy way so you can easliy see at which regions you genomes differ. From there, you could extract the sites which differ in at least say 2 genomes. And infer a phylogeny on only those sites giving you at least an idea whats going on in a phylogentic manner.

            Best phil
            Last edited by sphil; 08-28-2013, 12:25 AM.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            18 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            22 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            17 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            49 views
            0 likes
            Last Post seqadmin  
            Working...
            X