Quote:
Originally Posted by suzumar
Hi Sebastian and thanks for developing Ray. I am working on a sponge metagenome (ion torrent) and I am trying to setup ray for taxonomy and communities.
I an trying to setup the files for the latest version of greengenes (2012_08) and have parsed the information in the fasta file to the same format as 2011_01, and I am trying to manually run the script
|
It's good to know that there is a new release.
Quote:
Paper-Replication-2012 / Build-Input-Files-for-GreenGenes-Taxonomy / main.sh
and have one question regarding fasta files for Ray Taxonomy and Communities
I have notices that for the NCBI taxonomy the script Paper-Replication-2012 / Build-Input-Files-for-NCBI-Taxonomy / CreateRayInputStructures.sh
Creates a single fasta file with for each genome. My question is whether those reference fasta files are just a concatenation of all .fna files associated with anty given genome. (and so there are multiples IDs and accessions associated with a given "genome".
|
Yes. drafts can have several .fna files that can be concatenated.
Quote:
This becomes an is an issue for draft genomes (lots of scaffolds) or eukaryotic chromosomes, which I will have to "manually merge"
|
You can do a bash command line to do that for you. Something like
Code:
mkdir merged
for draft in $(ls drafts)
do
cat drafts/$draft/*.fna > merged/$draft.fasta
done
Quote:
Actually after I double checked the CreateRayInputStructures.sh script it seems to be the case, but would you please confirm it?
Marcelino
|
Yes. this the code below does what you said:
Code:
if test ! -d NCBI-Finished-Bacterial-Genomes
then
echo "Creating $OutputDirectory/NCBI-Finished-Bacterial-Genomes, please wait."
mkdir NCBI-Finished-Bacterial-Genomes
cd NCBI-Finished-Bacterial-Genomes
for i in $(ls ../uncompressed/all.fna)
do
name=$(echo $i|sed 's/_uid/ /g'|awk '{print $1}')
cat ../uncompressed/all.fna/$i/*.fna > $name".fasta"
done
echo "Done."
cd ..
fi