View Single Post
Old 02-21-2013, 09:41 AM   #3
seb567
Senior Member
 
Location: Québec, Canada

Join Date: Jul 2008
Posts: 260
Default

Hi,

Quote:
Originally Posted by severin View Post
How do I include genomes other than the bacteria that are found in the NCBI-taxonomy directory that your script generates?

Genome-to-Taxon.tsv has 2 columns (tab-separated): GenBankIdentifier taxonIdentifier.

Both are integers.

So you need to append entries to this file.

See https://github.com/sebhtml/ray/blob/...n/Taxonomy.txt

Quote:
Originally Posted by severin View Post
I could drop the fasta file into a folder however...
Indeed, sequences deposited in directories that you provide to Ray with the -search option
will be picked up by Ray Communities plugins.

Quote:
Originally Posted by severin View Post

Is there an easy way to include the taxonomy information about the genomes I add?
No, you need to add one line for each relationship you desire.

Quote:
Originally Posted by severin View Post
You added Human in the paper, but if I wanted to include multiple species that the taxonomy is known do I have to do this manually or is there a tool that can help me achieve this?
Well, because what people want to add in this system can come from various sources (not
just NCBI), it's hard to devise a tool that will be usable and portable for all these sources.

So I guess your best bet is to write a small tool that does it for you so that you
don't have to do it manually.

If you think that this should be a service provided by Ray, you can fill in a ticket at

https://github.com/sebhtml/ray/issues/new

Quote:
Originally Posted by severin View Post

Also, I am interested in not just obtaining the abundances but also assigning the scaffolds to particular species or other level in the taxonomy. Does Ray output the scaffold to taxon information somewhere?
The system will identify contigs for you on the basis on sequences provided by the -search
options.

Files:

Code:
RayMicrobiomeAnalysis/
BiologicalAbundances/
_DenovoAssembly/
Contigs.tsv
*.CoverageData.xml

_Coloring/
_Frequencies/

NCBI-bacteria-directory/
ContigIdentifications.tsv
_Files.tsv
SequenceAbundances.xml

NCBI-viruses-directory/
ContigIdentifications.tsv
_Files.tsv
SequenceAbundances.xml
See https://github.com/sebhtml/ray/blob/...Abundances.txt

Quote:
Originally Posted by severin View Post


One last question.
If I have an assembly from say Trinity can I run the assembly through Ray-Meta and have it return abundances based on the transcripts themselves?
This is a feature that a sizable number of people at my institution are desiring too --
that Ray provides a feature to build the de Bruijn graph from assembled sequences (with
other tools) to benefit from other capabilities like Ray Communities.

The Ray C++ API for messages actually supports this, but the plugins that build the de Bruijn graph
(namely plugin_SequencesLoader, plugin_KmerAcademyBuilder and plugin_VerticesExtractor) are
working only on reads at the moment.

Quote:
Originally Posted by severin View Post

How dependent is the algorithm to have done the assembly prior?
It's independant. The quantification algorithms work on a colored de Bruijn graph.
But it does not really use assembled paths for these computations (aside from what's in
files for contig identification obviously).

Quote:
Originally Posted by severin View Post

Can I feed Ray-Meta a kmer graph?
No, this is not possible at the moment.
But that's something that could be implemented as Ray (and ABySS too)
supports the Ray Cloud Browser kmer graph format.

The file format is like this:

map.csv (ASCII) (called kmers.txt in Ray)

The file is tab-separated, any line starting with a '#' is a comment.


A line looks like this.

GCGGTTATGCTTGCGTCCACCGTAAGTTCGGATTCAGACTTAATCAAAGGTTTTAACAAAGCGCTGGCAACCCCACGGCGGGGGTATTCAG;47;T;G

See https://github.com/sebhtml/Ray-Cloud...Map-format.txt


If you did not know about Ray Cloud Browser, it allows end users to interactively skim processed genomics data with energy.

Demo: http://browser.cloud.boisvert.info/c...location=13000

All you need to get started is a kmer graph and fasta sequences (with Ray: kmers.txt and Contigs.fasta).

Regarding kmer graphs (you mentionned that in your question):

Quote:
Originally Posted by severin View Post

Thanks and really excited to use this tool.
We are also very exciting to have end users adopting our highly scalable methods for genomics.
seb567 is offline   Reply With Quote