Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problems with the nexus format and generating a phylogeny

    I have 5 nuclear de novos and I was attempting to build a phylogeny out of them, but I'm running into a brick wall. I was initially trying to use MrBayes, but I can't figure out how to format the nexus file. My .fasta files are set up like this:

    >scaffold3 Locus_16_0 16.4 COMPLEX
    GAGTTTGCAAGGAAGCCTCCCAAACAGGATGTAAGCACAAAGAAGCAGAAACAGAAGAATGTGAACTTGGTGGATGAGCAGAGGGCAAAGAGATTGAAGT
    TAGGACCTGGTATGAAGGTGAAATATGATCAGGTCAAAGGAGGTTATTACATAGAGGTTGGTTTCTGTTATATAATATGTGTTGTTGATTAATAACAAGT
    TAAGTATGANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGTAGGATAAACATGGGAATGGGAATCCAGTGGGTATTGCAGGCACTTCATCCTCAATGC
    CCCATACTGAAGTTGAAACATCAAATGATCATGAGGTTAGTATATACAATATAATCTGGTTAGACTATATTTAAATAAAATGTTGCTTCTTGCACTGATA
    TACATCATCTTGTTATCATTTGTAGGATGAATCTGTGAAAGCAATNTAACAATGTAGGAACATCATCATCACTTAAAATACCAGGGCTGCAACTTAGGAG
    AAGTAGGAGACTGTTAGTTCAGCCTACAAATCTATCAGATCCTGCTCCTTCACAGGTCCCAGAGCCTGTTATTTCTAAATGCCCAAATCCTGTTTTTTCA
    AAGCTAGCAGAACCAGTGTCCTTATCACATCATGTGGAAATACCAGCACCAAAAGTGGATCCTACAAGGAAGTTAAAGTCCTCTACACAGGCACAACCAG
    TGTTGCAACCCAGGAGAAGTAGCAGACTNNNNNNNNNNNNNNNNNNNNNNNNNNTAGAAAACAATCTTTTGGTGAAGGTTGTGTTGGGATGTTTATGACC
    TTGGTACTTTGTTTGGTTAACTGTGTATAACAAACCTATCAAACTTTTGTACTTNNNNNNNNNNNNNNNNNNATGGTTCAAAGTACAGTATATGAAAAAT
    TTGTCTTGGTGCACAAAAATTGGATTGTTTCATAATGTGCAGATATGTAACAAGTGTAGATGATCCAAAACTTTCATCATTGACATCAATTGTACATTTA
    CCATACATAAACAAGTGCATATTTCTATATTTTGTTGGTCATTATCATTTTAGGGAATATTACAATACCAACTAAATCCAAAG
    It's just a list of different loci and scaffolds that I got from a SOAPdenovo assembly. The nexus file seems to require a single fasta file where every species is represented, and I'm not sure how to go about doing that.

    Do I need to figure out which scaffolds are from the same location on each species so I can compare them, or can I compare them with what I have here? Will MrBayes work for this, or do I need to work with something else?

  • #2
    Hi,

    Yes will need to first identify which regions (scaffolds) are homologs (i.e. the same gene or comparable genomic regions). Some reciprocal blast algorithms might help you out with this.
    Once you have homologous scaffolds will need to align them for each of your 5 nuclear loci. I typically use the clustalW algorithm with Bioedit and them manually adjust the alignment. Your fasta file of scaffolds can be opened in bioedit and then you can select homologous scaffolds by highlighting them and then go to the "Accessory Application: tab and select "ClustalW multiple alignment". The manual adjustment can be a bit of an art. For example, if your loci code for protein it is good to keep codon frames in mind. Other mutational events like small inversions occur during DNA replication and might create stretches of apparently high levels of sequence divergence that are no more than areas that were reverse-complemented by stem-loop switching during replication.

    You will also need to remove all spaces from the sequence names before creating your nexus file. You can do this with find/replace in bioedit:

    To find/replace in titles go to "Edit" then "search" then "find/replace in titles"

    find: " "
    replace with: "_" (quotes are not included)

    here is an example of a nexus file format suitable for phylogenetic analysis with MrBayes:

    begin data;
    dimensions ntax=4 nchar=10;
    format datatype=DNA interleave=yes gap=-;
    matrix
    taxon_1 AACGATTCGT
    taxon_2 AAGGAT--CA
    taxon_3 AACGACTCCT
    taxon_4 AAGGATTCCT
    ;
    end;



    Keep in mind that you should also test for an appropriate substitution model with a program like MrModeltest2 since MrBayes is model based.

    Another thing to keep in mind is that you may have allellic variation since you have data from two homologous chromosomes (or more if your species are polyploid). If you are most interested in the evolutionary history of each locus individually you can do a MrBayes analysis for each locus individually from an alignment of all alleles. If you want to generate a species tree from all loci, a coalescent approach, such as *BEAST, will be more appropriate.

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Essential Discoveries and Tools in Epitranscriptomics
      by seqadmin


      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
      Yesterday, 07:01 AM
    • seqadmin
      Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 04-11-2024, 12:08 PM
    0 responses
    39 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 10:19 PM
    0 responses
    41 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 09:21 AM
    0 responses
    35 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-04-2024, 09:00 AM
    0 responses
    55 views
    0 likes
    Last Post seqadmin  
    Working...
    X