Hi ,
I am trying to make a tree with RAxML that I will then use with placer.
When I run the raxml command I get the error: Problem reading number of species and sites.
I googled the error and used the sed command to remove spaces from the header of the aligned nucleotide file. I used muscle to align my sequences. This is what my header looks like after removing spaces:
>ENA|CAJ48085|CAJ48085.1Bordetellaavium197Nbiodegradativeargininedecarboxylase
ATGAAATTTCGCTTCCCCATTTTCATCATCGACGAAGACTTCCGTTCCGAGAACGCCTCG
The raxml manual indicates that identical sequences are a problem. This is from the "Alignment Error Checking" section of the manual:
2. Identical Sequence(s) that have di erent names but are exactly identical. This mostly happens when you excluded some hard-to-align alignment regions from your alignment and does not make sense to use.
My sequences have a high percent identity. Uclust percent identity output for my sequences looks like 100%, 100%, 99.9%, 99.9%, 99.8%, 99.8% and some of the genus/species names are the same too. Is this the source of the error? Or maybe it is something else.
Is there a tree building software that I can use with placer that can handle sequences where some have 100% identity ???
Thanks!! Sorry this is so long.
I am trying to make a tree with RAxML that I will then use with placer.
When I run the raxml command I get the error: Problem reading number of species and sites.
I googled the error and used the sed command to remove spaces from the header of the aligned nucleotide file. I used muscle to align my sequences. This is what my header looks like after removing spaces:
>ENA|CAJ48085|CAJ48085.1Bordetellaavium197Nbiodegradativeargininedecarboxylase
ATGAAATTTCGCTTCCCCATTTTCATCATCGACGAAGACTTCCGTTCCGAGAACGCCTCG
The raxml manual indicates that identical sequences are a problem. This is from the "Alignment Error Checking" section of the manual:
2. Identical Sequence(s) that have di erent names but are exactly identical. This mostly happens when you excluded some hard-to-align alignment regions from your alignment and does not make sense to use.
My sequences have a high percent identity. Uclust percent identity output for my sequences looks like 100%, 100%, 99.9%, 99.9%, 99.8%, 99.8% and some of the genus/species names are the same too. Is this the source of the error? Or maybe it is something else.
Is there a tree building software that I can use with placer that can handle sequences where some have 100% identity ???
Thanks!! Sorry this is so long.
Comment