greetings, i have a large list of taxonomic information from a blast search in a text file (taxa.txt, see below).
$ head -5 taxa.txt
Eukaryota;Opisthokonta;Fungi;Ascomycota
Bacteria;Candidate division OD1;marine metagenome
Bacteria;Planctomycetes;Phycisphaerae
Bacteria;Cyanobacteria;Chloroplast;
Eukaryota;Opisthokonta;Fungi;Ascomycota;
What i would like to do is create a bar plot of this data in excel or R. To do this, I have been using sort and uniq to tally up each of the hits.
$sort -u taxa.txt|uniq -c > sorted_taxa.txt
$head -5 sorted_taxa.txt
1 Archaea;Euryarchaeota;Halobacteria
1 Archaea;Euryarchaeota;Thermoplasmata
1 Archaea;Thaumarchaeota;Marine Group I
1 Archaea;Thaumarchaeota;Marine Group I
1 Archaea;Thaumarchaeota;Soil Crenarchaeotic Group(SCG)
However, when i open this file, and copy and paste the results into excel, it treats each row as one single string. What I would like is for there to be two columns (at least) in output, the column describing the count, and the column describing the taxonomic information. I believe this would be tab-delimited output. Is there a way to tell unix to produce such a result? Thanks,
-Tony
$ head -5 taxa.txt
Eukaryota;Opisthokonta;Fungi;Ascomycota
Bacteria;Candidate division OD1;marine metagenome
Bacteria;Planctomycetes;Phycisphaerae
Bacteria;Cyanobacteria;Chloroplast;
Eukaryota;Opisthokonta;Fungi;Ascomycota;
What i would like to do is create a bar plot of this data in excel or R. To do this, I have been using sort and uniq to tally up each of the hits.
$sort -u taxa.txt|uniq -c > sorted_taxa.txt
$head -5 sorted_taxa.txt
1 Archaea;Euryarchaeota;Halobacteria
1 Archaea;Euryarchaeota;Thermoplasmata
1 Archaea;Thaumarchaeota;Marine Group I
1 Archaea;Thaumarchaeota;Marine Group I
1 Archaea;Thaumarchaeota;Soil Crenarchaeotic Group(SCG)
However, when i open this file, and copy and paste the results into excel, it treats each row as one single string. What I would like is for there to be two columns (at least) in output, the column describing the count, and the column describing the taxonomic information. I believe this would be tab-delimited output. Is there a way to tell unix to produce such a result? Thanks,
-Tony
Comment