what is wrong with this phylip-file? (Can't be read by fpars)

someperson

Junior Member

Join Date: Jul 2013

Posts: 9
- Share
- Tweet
#1

what is wrong with this phylip-file? (Can't be read by fpars)

07-20-2014, 08:40 AM

Hi,
I'm trying to create phylogenetic trees based on gene-content data (seqeunces as discrete character seqeunces where "1"=presence of gene or homolog, "0"=absence of gene or homolog).
One of the tools I want to use is fpars. This seems to work just fine with multiple discrete character alignments in phylip format.

My test_input-data is in phylip format like in this example:

Code:

6 50 CYS1_DIC 10100110101110011010101010111010111100101010101001 ALEU_HOR 10100110101110011010111011111010111101111010101001 CATH_HUM 10101110111111111010101010111010111110101110101111 CYS2_DIC 10011010111001101010101011001011110000101010100100 ALEU_HOA 11100100101111011010111011111110110101111010101001 CATH_HUB 10101110110011111010101010111110111110101111101111

However, I want to do bootstrapping with this data, and as stated in this thread, the fseqboot-tool just doesnt seem to handle my input data.

So I wrote my own function in python to do resampling of my alignment data and store the resampled data in a multiple alignmentfile using Bio.AlignIO.write().

The resulting file looks like this (first 14 lines as example):

Code:

6 50 CYS1_DIC 00000111110010011011111011111010110111101010001100 ALEU_HOR 00011111110011011011111011111010110111111010001100 CATH_HUM 00000111111010011111111011111110111111101110101111 CYS2_DIC 01100111110011001100111000000110111011100101001011 ALEU_HOA 00011111110011011011111001111110110111111010010100 CATH_HUB 00000111111110111111111011111110111111101100101111 6 50 CYS1_DIC 11010111100011010101101010110101010100000001110111 ALEU_HOR 11010111110011011111101010110101110100000001111111 CATH_HUM 11110111100011010101111111110101011100011001110111 CYS2_DIC 10100001100011010101000101110100011000000100010011 ALEU_HOA 11110110110110111111101000110111111100000001111111 CATH_HUB 11110011100011010101111111110101011101011001110111

this looks just like the results i would have expected from fseqboot (if it had handled my data correctly). The next steps would now be to run fpars on this data, and then to run fcondense on the fpars-result to get the bootstrap values for my trees.

However, when i try to run fpars on this data (command:"fpars -infile bootstrap_test4.phy -outfile TREES_BS_test4.tree -auto -seed 12345"), I get the following message:

Code:

Error: Bad discrete states file b 'bootstrap_test4.phy': read 61 states for '<null>', expected 50 Error: Unable to read discrete states from 'bootstrap_test4.phy' Died: fpars terminated: Bad value for '-infile' with -auto defined

Can anybody see what's wron here? What can I do?

Last edited by someperson; 07-20-2014, 08:43 AM.
Tags: alignments, emboss, fpars, phylip, python
mbblack

Senior Member

Join Date: Aug 2009

Posts: 245
- Share
- Tweet
#2

07-21-2014, 05:15 AM

Check your bootstrap file. From the error, the program is expecting 61 characters, yet you only have 50. So you must have un-intended spaces or some formatting error in your input text file - the program is misreading that input file, for some reason, and wrapping lines on you.

Michael Black, Ph.D.
ScitoVation LLC. RTP, N.C.
Comment

Previous template Next

Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM
Strategies for Sequencing Challenging Samples

by seqadmin

Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
- Channel: Articles
03-22-2024, 06:39 AM

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 24 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 21 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

what is wrong with this phylip-file? (Can't be read by fpars)

Comment

Latest Articles

ad_right_rmr

News