View Single Post
Old 04-09-2013, 05:49 PM   #21
seb567
Senior Member
 
Location: Québec, Canada

Join Date: Jul 2008
Posts: 260
Default

Quote:
Originally Posted by suzumar View Post
Hi Sebastian and thanks for developing Ray. I am working on a sponge metagenome (ion torrent) and I am trying to setup ray for taxonomy and communities.

I an trying to setup the files for the latest version of greengenes (2012_08) and have parsed the information in the fasta file to the same format as 2011_01, and I am trying to manually run the script
It's good to know that there is a new release.

Quote:
Paper-Replication-2012 / Build-Input-Files-for-GreenGenes-Taxonomy / main.sh

and have one question regarding fasta files for Ray Taxonomy and Communities

I have notices that for the NCBI taxonomy the script Paper-Replication-2012 / Build-Input-Files-for-NCBI-Taxonomy / CreateRayInputStructures.sh

Creates a single fasta file with for each genome. My question is whether those reference fasta files are just a concatenation of all .fna files associated with anty given genome. (and so there are multiples IDs and accessions associated with a given "genome".
Yes. drafts can have several .fna files that can be concatenated.

Quote:

This becomes an is an issue for draft genomes (lots of scaffolds) or eukaryotic chromosomes, which I will have to "manually merge"
You can do a bash command line to do that for you. Something like

Code:
mkdir merged

for draft in $(ls drafts)
do
    cat drafts/$draft/*.fna > merged/$draft.fasta
done
Quote:
Actually after I double checked the CreateRayInputStructures.sh script it seems to be the case, but would you please confirm it?

Marcelino
Yes. this the code below does what you said:

Code:
if test ! -d NCBI-Finished-Bacterial-Genomes
then
        echo "Creating $OutputDirectory/NCBI-Finished-Bacterial-Genomes, please wait."

        mkdir NCBI-Finished-Bacterial-Genomes
        cd NCBI-Finished-Bacterial-Genomes

        for i in $(ls ../uncompressed/all.fna)
        do
                name=$(echo $i|sed 's/_uid/ /g'|awk '{print $1}')

                cat ../uncompressed/all.fna/$i/*.fna > $name".fasta"
        done

        echo "Done."

        cd ..
fi
seb567 is offline   Reply With Quote