Hi everyone,
I have a large orthologous tree file (newick format) with one tree per line. I only want the single-copy gene trees (one copy per one species). I tried to select by the ID of the species with awk command but I always modified and damaged the trees.
The simplest tree is like ("(", ID of the species 1, dot, name of the gene, ":", distance value, ID of the species 2, dot, name of the gene, ":", distance value, ")", ";" ):
(ID1_XXXX.geneXXX:0.XX,ID2_XXXX.geneXXX:0.XX);
Any script/idea to select the single-copy gene trees?
I have a large orthologous tree file (newick format) with one tree per line. I only want the single-copy gene trees (one copy per one species). I tried to select by the ID of the species with awk command but I always modified and damaged the trees.
The simplest tree is like ("(", ID of the species 1, dot, name of the gene, ":", distance value, ID of the species 2, dot, name of the gene, ":", distance value, ")", ";" ):
(ID1_XXXX.geneXXX:0.XX,ID2_XXXX.geneXXX:0.XX);
Any script/idea to select the single-copy gene trees?