SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Literature Watch (http://seqanswers.com/forums/forumdisplay.php?f=10)
-   -   Ray Meta: scalable de novo metagenome assembly and profiling (http://seqanswers.com/forums/showthread.php?t=26278)

seb567 01-08-2013 05:57 AM

Ray Meta: scalable de novo metagenome assembly and profiling
 
Ray Meta: scalable de novo metagenome assembly and profiling
Genome Biology 2012, 13:R122 doi:10.1186/gb-2012-13-12-r122

Voluminous parallel sequencing datasets, especially metagenomic experiments, require distributed computing for de novo assembly and taxonomic profiling. Ray Meta is a massively distributed metagenome assembler that is coupled with Ray Communities, which profiles microbiomes based on uniquely-colored k-mers. It can accurately assemble and profile a three billion read metagenomic experiment representing 1,000 bacterial genomes of uneven proportions in 15 hours with 1,024 processor cores, using only 1.5 GB per core. The software will facilitate the processing of large and complex datasets, and will help in generating biological insights on specific environments. Ray Meta is open source and available at http://denovoassembler.sf.net.

severin 02-20-2013 01:06 PM

Ray Meta
 
How do I include genomes other than the bacteria that are found in the NCBI-taxonomy directory that your script generates? I could drop the fasta file into a folder however...

Is there an easy way to include the taxonomy information about the genomes I add? You added Human in the paper, but if I wanted to include multiple species that the taxonomy is known do I have to do this manually or is there a tool that can help me achieve this?

Also, I am interested in not just obtaining the abundances but also assigning the scaffolds to particular species or other level in the taxonomy. Does Ray output the scaffold to taxon information somewhere?

One last question.
If I have an assembly from say Trinity can I run the assembly through Ray-Meta and have it return abundances based on the transcripts themselves? How dependent is the algorithm to have done the assembly prior? Can I feed Ray-Meta a kmer graph?


Thanks and really excited to use this tool.

seb567 02-21-2013 09:41 AM

Hi,

Quote:

Originally Posted by severin (Post 96974)
How do I include genomes other than the bacteria that are found in the NCBI-taxonomy directory that your script generates?


Genome-to-Taxon.tsv has 2 columns (tab-separated): GenBankIdentifier taxonIdentifier.

Both are integers.

So you need to append entries to this file.

See https://github.com/sebhtml/ray/blob/...n/Taxonomy.txt

Quote:

Originally Posted by severin (Post 96974)
I could drop the fasta file into a folder however...

Indeed, sequences deposited in directories that you provide to Ray with the -search option
will be picked up by Ray Communities plugins.

Quote:

Originally Posted by severin (Post 96974)

Is there an easy way to include the taxonomy information about the genomes I add?

No, you need to add one line for each relationship you desire.

Quote:

Originally Posted by severin (Post 96974)
You added Human in the paper, but if I wanted to include multiple species that the taxonomy is known do I have to do this manually or is there a tool that can help me achieve this?

Well, because what people want to add in this system can come from various sources (not
just NCBI), it's hard to devise a tool that will be usable and portable for all these sources.

So I guess your best bet is to write a small tool that does it for you so that you
don't have to do it manually.

If you think that this should be a service provided by Ray, you can fill in a ticket at

https://github.com/sebhtml/ray/issues/new

Quote:

Originally Posted by severin (Post 96974)

Also, I am interested in not just obtaining the abundances but also assigning the scaffolds to particular species or other level in the taxonomy. Does Ray output the scaffold to taxon information somewhere?

The system will identify contigs for you on the basis on sequences provided by the -search
options.

Files:

Code:

RayMicrobiomeAnalysis/
BiologicalAbundances/
_DenovoAssembly/
Contigs.tsv
*.CoverageData.xml

_Coloring/
_Frequencies/

NCBI-bacteria-directory/
ContigIdentifications.tsv
_Files.tsv
SequenceAbundances.xml

NCBI-viruses-directory/
ContigIdentifications.tsv
_Files.tsv
SequenceAbundances.xml

See https://github.com/sebhtml/ray/blob/...Abundances.txt

Quote:

Originally Posted by severin (Post 96974)


One last question.
If I have an assembly from say Trinity can I run the assembly through Ray-Meta and have it return abundances based on the transcripts themselves?

This is a feature that a sizable number of people at my institution are desiring too --
that Ray provides a feature to build the de Bruijn graph from assembled sequences (with
other tools) to benefit from other capabilities like Ray Communities.

The Ray C++ API for messages actually supports this, but the plugins that build the de Bruijn graph
(namely plugin_SequencesLoader, plugin_KmerAcademyBuilder and plugin_VerticesExtractor) are
working only on reads at the moment.

Quote:

Originally Posted by severin (Post 96974)

How dependent is the algorithm to have done the assembly prior?

It's independant. The quantification algorithms work on a colored de Bruijn graph.
But it does not really use assembled paths for these computations (aside from what's in
files for contig identification obviously).

Quote:

Originally Posted by severin (Post 96974)

Can I feed Ray-Meta a kmer graph?

No, this is not possible at the moment.
But that's something that could be implemented as Ray (and ABySS too)
supports the Ray Cloud Browser kmer graph format.

The file format is like this:

map.csv (ASCII) (called kmers.txt in Ray)

The file is tab-separated, any line starting with a '#' is a comment.


A line looks like this.

GCGGTTATGCTTGCGTCCACCGTAAGTTCGGATTCAGACTTAATCAAAGGTTTTAACAAAGCGCTGGCAACCCCACGGCGGGGGTATTCAG;47;T;G

See https://github.com/sebhtml/Ray-Cloud...Map-format.txt


If you did not know about Ray Cloud Browser, it allows end users to interactively skim processed genomics data with energy.

Demo: http://browser.cloud.boisvert.info/c...location=13000

All you need to get started is a kmer graph and fasta sequences (with Ray: kmers.txt and Contigs.fasta).

Regarding kmer graphs (you mentionned that in your question):

Quote:

Originally Posted by severin (Post 96974)

Thanks and really excited to use this tool.

We are also very exciting to have end users adopting our highly scalable methods for genomics.

severin 02-21-2013 10:54 AM

estimates of composition
 
Thanks for the quick reply. As I am working with these features more I am curious about the following.

What does ray do with contigs and scaffolds it cannot assign to a taxon?

Are they included in the composition analysis?

seb567 02-21-2013 11:23 AM

Quote:

Originally Posted by severin (Post 97094)
Thanks for the quick reply. As I am working with these features more I am curious about the following.

What does ray do with contigs and scaffolds it cannot assign to a taxon?

Are they included in the composition analysis?

The composition analysis is performed on the colored de Bruijn graph, not on contigs.


See our Genome Biology paper

severin 02-26-2013 08:35 AM

Nice tool
 
Sebastien,

This really is a nice tool. Sorry to bombard you with so many questions but I would like to know the limitations of the tools I am using. Some of the runs I have experienced where not all the contigs are assigned to a species. In which case wouldn't this lead to a misrepresentation of what is present in the sample?

How hard would it be to also output the relationship between contig and Taxonomic level? ... Order family genus etc

ie contig-001 Micrococcineae

In other cases every contig is assigned, in which case, how do we determine quality of match to a bacteria or virus if those are the genomes we are using when in actuality the contig belongs to a Eukaryote? Ie possible miss-assignment due to limited number of genomes in the search.

Finally, How does kmer length affect ability to assign a contig to a species/taxonomic group? Have you look at this?

Thanks for all your help on this.

Regards,

Andrew

seb567 02-26-2013 05:48 PM

Quote:

Originally Posted by severin (Post 97513)
Sebastien,

This really is a nice tool. Sorry to bombard you with so many questions but I would like to know the limitations of the tools I am using.

Some of the runs I have experienced where not all the contigs are assigned to a species. In which case wouldn't this lead to a misrepresentation of what is present in the sample?

Do you mean that the percentage of unknown life forms is underrepresented ?

Quote:

Originally Posted by severin (Post 97513)

How hard would it be to also output the relationship between contig and Taxonomic level? ... Order family genus etc

It's just a matter of adding the code at the good place.

Quote:

Originally Posted by severin (Post 97513)

ie contig-001 Micrococcineae

In other cases every contig is assigned, in which case, how do we determine quality of match to a bacteria or virus if those are the genomes we are using when in actuality the contig belongs to a Eukaryote? Ie possible miss-assignment due to limited number of genomes in the search.

If you search for a virus, and a given mammal genome contains all the sequences
of the virus and this mammal genome is not provided to Ray Communities, then yes, Ray
will tell you that it's from a virus.

If you provide Ray Communities with the virus genome and the mammal genome, then the
software will look for those kmers that are not in common, if any.

Quote:

Originally Posted by severin (Post 97513)

Finally, How does kmer length affect ability to assign a contig to a species/taxonomic group?

Longer kmers are more specific.

Allowing mismatches would allow sensitive kmer search with large kmers. Mismatches
are not implemented at the moment.

Quote:

Originally Posted by severin (Post 97513)
Have you look at this?

Not a lot, honestly.

Quote:

Originally Posted by severin (Post 97513)

Thanks for all your help on this.

Regards,

Andrew


severin 03-13-2013 10:49 AM

lots of searching
 
Hi again.

I was wondering if there is a way to restart a search if the run is terminated prematurely.

I am running Ray meta with all genomes from ncbi. I have a sample that contains multiple eukaryotic and microbial transcriptomes of unknown origin.
I have 256 cores on this and it takes about 3 hours to assemble the genome but it takes more than 21 hours to load the genomes I want to search. I get the impression that checkpoints do not include the ray meta analysis. is it possible that this could be included in the checkpoints?


Andrew

seb567 03-13-2013 10:54 AM

Quote:

Originally Posted by severin (Post 99025)
Hi again.

I was wondering if there is a way to restart a search if the run is terminated prematurely.

I am running Ray meta with all genomes from ncbi. I have a sample that contains multiple eukaryotic and microbial transcriptomes of unknown origin.
I have 256 cores on this and it takes about 3 hours to assemble the genome but it takes more than 21 hours to load the genomes I want to search. I get the impression that checkpoints do not include the ray meta analysis. is it possible that this could be included in the checkpoints?


Andrew

What is your command ?

severin 03-13-2013 11:01 AM

command
 
Quote:

Originally Posted by seb567 (Post 99027)
What is your command ?

mpirun -np 256 Ray-v2.1.0/Ray -k 41 -read-write-checkpoints checkpoints -one-color-per-file -search ./6b/ftp.ncbi.nih.gov/genomes/EURKARYOTES/ -search ./6b/ftp.ncbi.nih.gov/genomes/Viruses -search ./6b/GIF_2c/ftp.ncbi.nih.gov/genomes/Bacteria ./6b/GIF_2c/ftp.ncbi.nih.gov/genomes/Bacteria_DRAFT -search ./6b/GIF_2c/ftp.ncbi.nih.gov/genomes/HUMAN_MICROBIOM/Bacteria -search ./6b/ftp.ncbi.nih.gov/genomes/Fungi -with-taxonomy ./4/NCBI-taxonomy/Genome-to-Taxon.tsv ./4/NCBI-taxonomy/TreeOfLife-Edges.tsv ./4/NCBI-taxonomy/Taxon-Names.tsv -i ./TrimmedFiles/Combined.data.Trmatic.sorted.keep.pe.fasta -s ./TrimmedFiles/Combined.data.Trmatic.sorted.keep.se.fasta

seb567 03-14-2013 06:44 AM

Quote:

Originally Posted by severin (Post 99028)
mpirun -np 256 Ray-v2.1.0/Ray -k 41 -read-write-checkpoints checkpoints -one-color-per-file -search ./6b/ftp.ncbi.nih.gov/genomes/EURKARYOTES/ -search ./6b/ftp.ncbi.nih.gov/genomes/Viruses -search ./6b/GIF_2c/ftp.ncbi.nih.gov/genomes/Bacteria ./6b/GIF_2c/ftp.ncbi.nih.gov/genomes/Bacteria_DRAFT -search ./6b/GIF_2c/ftp.ncbi.nih.gov/genomes/HUMAN_MICROBIOM/Bacteria -search ./6b/ftp.ncbi.nih.gov/genomes/Fungi -with-taxonomy ./4/NCBI-taxonomy/Genome-to-Taxon.tsv ./4/NCBI-taxonomy/TreeOfLife-Edges.tsv ./4/NCBI-taxonomy/Taxon-Names.tsv -i ./TrimmedFiles/Combined.data.Trmatic.sorted.keep.pe.fasta -s ./TrimmedFiles/Combined.data.Trmatic.sorted.keep.se.fasta

Is the standard output file still being updated ?

Also, the -read-write-checkpoints option does not do anything after the scaffolding.

seb567 03-14-2013 07:15 AM

Quote:

Originally Posted by severin (Post 99025)
Hi again.

I was wondering if there is a way to restart a search if the run is terminated prematurely.

I am running Ray meta with all genomes from ncbi. I have a sample that contains multiple eukaryotic and microbial transcriptomes of unknown origin.
I have 256 cores on this and it takes about 3 hours to assemble the genome but it takes more than 21 hours to load the genomes I want to search. I get the impression that checkpoints do not include the ray meta analysis. is it possible that this could be included in the checkpoints?


Andrew

Hi,

I checked the logs, this was fixed on 2012-09-27.

The change is already available to all users with the development version of Ray.

The last stable version of Ray is v2.1.0, which was released on 2012-10-30.

Which version are you using ?

severin 03-14-2013 07:39 AM

Quote:

Originally Posted by seb567 (Post 99100)
Hi,

I checked the logs, this was fixed on 2012-09-27.

The change is already available to all users with the development version of Ray.

The last stable version of Ray is v2.1.0, which was released on 2012-10-30.

Which version are you using ?

I am using Ray v2.1.0. Where do I download the developers version?

Ray --version
Ray version 2.1.0
License for Ray: GNU General Public License version 3
RayPlatform version: 1.1.0
License for RayPlatform: GNU Lesser General Public License version 3

MAXKMERLENGTH: 99
KMER_U64_ARRAY_SIZE: 4
Maximum coverage depth stored by CoverageDepth: 4294967295
MAXIMUM_MESSAGE_SIZE_IN_BYTES: 4000 bytes
FORCE_PACKING = n
ASSERT = n
HAVE_LIBZ = n
HAVE_LIBBZ2 = n
CONFIG_PROFILER_COLLECT = n
CONFIG_CLOCK_GETTIME = n
__linux__ = y
_MSC_VER = n
__GNUC__ = y
RAY_32_BITS = n
RAY_64_BITS = y
MPI standard version: MPI 2.1
MPI library: Open-MPI 1.6.1
Compiler: GNU gcc/g++ Intel(R) C++ g++ 4.4 mode

seb567 03-14-2013 08:16 AM

Quote:

Originally Posted by severin (Post 99105)
I am using Ray v2.1.0. Where do I download the developers version?

Ray --version
Ray version 2.1.0
License for Ray: GNU General Public License version 3
RayPlatform version: 1.1.0
License for RayPlatform: GNU Lesser General Public License version 3

MAXKMERLENGTH: 99
KMER_U64_ARRAY_SIZE: 4
Maximum coverage depth stored by CoverageDepth: 4294967295
MAXIMUM_MESSAGE_SIZE_IN_BYTES: 4000 bytes
FORCE_PACKING = n
ASSERT = n
HAVE_LIBZ = n
HAVE_LIBBZ2 = n
CONFIG_PROFILER_COLLECT = n
CONFIG_CLOCK_GETTIME = n
__linux__ = y
_MSC_VER = n
__GNUC__ = y
RAY_32_BITS = n
RAY_64_BITS = y
MPI standard version: MPI 2.1
MPI library: Open-MPI 1.6.1
Compiler: GNU gcc/g++ Intel(R) C++ g++ 4.4 mode

To get the development version:

Code:

git clone git://github.com/sebhtml/ray.git
git clone git://github.com/sebhtml/RayPlatform.git
cd ray
make
./Ray -version


severin 03-14-2013 09:28 AM

read-write checkpoints
 
Quote:

Originally Posted by seb567 (Post 99106)
To get the development version:

Code:

git clone git://github.com/sebhtml/ray.git
git clone git://github.com/sebhtml/RayPlatform.git
cd ray
make
./Ray -version



So when you say it is fixed in the developers version does that mean the read-write checkpoints will go beyond the scaffolding process?

Thanks

seb567 03-14-2013 10:35 AM

Quote:

Originally Posted by severin (Post 99114)
So when you say it is fixed in the developers version does that mean the read-write checkpoints will go beyond the scaffolding process?

Thanks

The -read-write-checkpoints does not do anything after the scaffolding in development version too.

However, that's a feature that could be added.

severin 03-14-2013 11:43 AM

Install Error
 
Quote:

Originally Posted by seb567 (Post 99106)
To get the development version:

Code:

git clone git://github.com/sebhtml/ray.git
git clone git://github.com/sebhtml/RayPlatform.git
cd ray
make
./Ray -version


when I follow your suggestion with the developmental version I get the following errors

icpc: command line warning #10159: invalid argument for option '-std'
CXX code/plugin_KmerAcademyBuilder/Kmer.o
icpc: command line warning #10159: invalid argument for option '-std'
CXX code/plugin_Library/LibraryPeakFinder.o
icpc: command line warning #10159: invalid argument for option '-std'
CXX code/plugin_Library/LibraryWorker.o
icpc: command line warning #10159: invalid argument for option '-std'
CXX code/plugin_Library/Library.o
icpc: command line warning #10159: invalid argument for option '-std'
CXX code/plugin_MachineHelper/MachineHelper.o
icpc: command line warning #10159: invalid argument for option '-std'
CXX code/plugin_MessageProcessor/MessageProcessor.o
icpc: command line warning #10159: invalid argument for option '-std'
CXX code/plugin_Mock/Parameters.o
icpc: command line warning #10159: invalid argument for option '-std'
code/plugin_Mock/Parameters.cpp(2129): warning #68: integer conversion resulted in a change of sign
uint64_t value=-1;
^

If I run the make file without the -std=c++98 the ray program crashes during the step that follows Selection of optimal read markers

[node195:41872] [10] /lib64/libc.so.6(__libc_start_main+0xfd) [0x33bb21ec5d]
[node195:41872] [11] Ray() [0x469429]
[node195:41872] *** End of error message ***
[node193:49049] 8 more processes have sent help message help-odls-default.txt / odls-default:could-not-kill

==> BATCH_OUTPUT.ray4 <==
[-9] ------> AAAAAAAAATGTGCCTTCGTTTCAAGTTCTATTCATTCTAC
[-8] ------> AAAAAAAATGTGCCTTCGTTTCAAGTTCTATTCATTCTACG
[-7] ------> AAAAAAATGTGCCTTCGTTTCAAGTTCTATTCATTCTACGA
[-6] ------> AAAAAATGTGCCTTCGTTTCAAGTTCTATTCATTCTACGAC
[-5] ------> AAAAATGTGCCTTCGTTTCAAGTTCTATTCATTCTACGACC
[-4] ------> AAAATGTGCCTTCGTTTCAAGTTCTATTCATTCTACGACCT
[-3] ------> AAATGTGCCTTCGTTTCAAGTTCTATTCATTCTACGACCTC
[-2] ------> AATGTGCCTTCGTTTCAAGTTCTATTCATTCTACGACCTCA
[-1] ------> ATGTGCCTTCGTTTCAAGTTCTATTCATTCTACGACCTCAA
[0] ------> TGTGCCTTCGTTTCAAGTTCTATTCATTCTACGACCTCAAC


I see someone else had the same error but I didn't see a resolution for it
http://www.mail-archive.com/denovoas.../msg00317.html

seb567 03-15-2013 08:59 AM

Quote:

Originally Posted by severin (Post 99136)
when I follow your suggestion with the developmental version I get the following errors

icpc: command line warning #10159: invalid argument for option '-std'
CXX code/plugin_KmerAcademyBuilder/Kmer.o
icpc: command line warning #10159: invalid argument for option '-std'
CXX code/plugin_Library/LibraryPeakFinder.o
icpc: command line warning #10159: invalid argument for option '-std'
CXX code/plugin_Library/LibraryWorker.o
icpc: command line warning #10159: invalid argument for option '-std'
CXX code/plugin_Library/Library.o
icpc: command line warning #10159: invalid argument for option '-std'
CXX code/plugin_MachineHelper/MachineHelper.o
icpc: command line warning #10159: invalid argument for option '-std'
CXX code/plugin_MessageProcessor/MessageProcessor.o
icpc: command line warning #10159: invalid argument for option '-std'
CXX code/plugin_Mock/Parameters.o
icpc: command line warning #10159: invalid argument for option '-std'
code/plugin_Mock/Parameters.cpp(2129): warning #68: integer conversion resulted in a change of sign
uint64_t value=-1;
^

If I run the make file without the -std=c++98 the ray program crashes during the step that follows Selection of optimal read markers

[node195:41872] [10] /lib64/libc.so.6(__libc_start_main+0xfd) [0x33bb21ec5d]
[node195:41872] [11] Ray() [0x469429]
[node195:41872] *** End of error message ***
[node193:49049] 8 more processes have sent help message help-odls-default.txt / odls-default:could-not-kill

==> BATCH_OUTPUT.ray4 <==
[-9] ------> AAAAAAAAATGTGCCTTCGTTTCAAGTTCTATTCATTCTAC
[-8] ------> AAAAAAAATGTGCCTTCGTTTCAAGTTCTATTCATTCTACG
[-7] ------> AAAAAAATGTGCCTTCGTTTCAAGTTCTATTCATTCTACGA
[-6] ------> AAAAAATGTGCCTTCGTTTCAAGTTCTATTCATTCTACGAC
[-5] ------> AAAAATGTGCCTTCGTTTCAAGTTCTATTCATTCTACGACC
[-4] ------> AAAATGTGCCTTCGTTTCAAGTTCTATTCATTCTACGACCT
[-3] ------> AAATGTGCCTTCGTTTCAAGTTCTATTCATTCTACGACCTC
[-2] ------> AATGTGCCTTCGTTTCAAGTTCTATTCATTCTACGACCTCA
[-1] ------> ATGTGCCTTCGTTTCAAGTTCTATTCATTCTACGACCTCAA
[0] ------> TGTGCCTTCGTTTCAAGTTCTATTCATTCTACGACCTCAAC


I see someone else had the same error but I didn't see a resolution for it
http://www.mail-archive.com/denovoas.../msg00317.html

Someone else also had the problem on 1 sample out of 15 samples during the coloring of the graph (endless processing with v2.1.0 on some samples):


I will fix this. Maybe for the v2.2.0 release, but it will probably appear in the v2.2.1 release later.

seb567 03-27-2013 09:17 AM

Hi,

I did a test with the Intel compiler and everything went fine.

Code:

icpc: command line warning #10159: invalid argument for option '-std'
This warning changes nothing for the Ray executable, it's just a warning saying that -std=c++98 is not an option of the Intel compiler.



Quote:

Originally Posted by severin (Post 99136)
when I follow your suggestion with the developmental version I get the following errors

icpc: command line warning #10159: invalid argument for option '-std'
CXX code/plugin_KmerAcademyBuilder/Kmer.o
icpc: command line warning #10159: invalid argument for option '-std'
CXX code/plugin_Library/LibraryPeakFinder.o
icpc: command line warning #10159: invalid argument for option '-std'
CXX code/plugin_Library/LibraryWorker.o
icpc: command line warning #10159: invalid argument for option '-std'
CXX code/plugin_Library/Library.o
icpc: command line warning #10159: invalid argument for option '-std'
CXX code/plugin_MachineHelper/MachineHelper.o
icpc: command line warning #10159: invalid argument for option '-std'
CXX code/plugin_MessageProcessor/MessageProcessor.o
icpc: command line warning #10159: invalid argument for option '-std'
CXX code/plugin_Mock/Parameters.o
icpc: command line warning #10159: invalid argument for option '-std'
code/plugin_Mock/Parameters.cpp(2129): warning #68: integer conversion resulted in a change of sign
uint64_t value=-1;
^

If I run the make file without the -std=c++98 the ray program crashes during the step that follows Selection of optimal read markers

[node195:41872] [10] /lib64/libc.so.6(__libc_start_main+0xfd) [0x33bb21ec5d]
[node195:41872] [11] Ray() [0x469429]
[node195:41872] *** End of error message ***
[node193:49049] 8 more processes have sent help message help-odls-default.txt / odls-default:could-not-kill

==> BATCH_OUTPUT.ray4 <==
[-9] ------> AAAAAAAAATGTGCCTTCGTTTCAAGTTCTATTCATTCTAC
[-8] ------> AAAAAAAATGTGCCTTCGTTTCAAGTTCTATTCATTCTACG
[-7] ------> AAAAAAATGTGCCTTCGTTTCAAGTTCTATTCATTCTACGA
[-6] ------> AAAAAATGTGCCTTCGTTTCAAGTTCTATTCATTCTACGAC
[-5] ------> AAAAATGTGCCTTCGTTTCAAGTTCTATTCATTCTACGACC
[-4] ------> AAAATGTGCCTTCGTTTCAAGTTCTATTCATTCTACGACCT
[-3] ------> AAATGTGCCTTCGTTTCAAGTTCTATTCATTCTACGACCTC
[-2] ------> AATGTGCCTTCGTTTCAAGTTCTATTCATTCTACGACCTCA
[-1] ------> ATGTGCCTTCGTTTCAAGTTCTATTCATTCTACGACCTCAA
[0] ------> TGTGCCTTCGTTTCAAGTTCTATTCATTCTACGACCTCAAC


I see someone else had the same error but I didn't see a resolution for it
http://www.mail-archive.com/denovoas.../msg00317.html


suzumar 04-02-2013 08:43 AM

Hi Sebastian and thanks for developing Ray. I am working on a sponge metagenome (ion torrent) and I am trying to setup ray for taxonomy and communities.

I an trying to setup the files for the latest version of greengenes (2012_08) and have parsed the information in the fasta file to the same format as 2011_01, and I am trying to manually run the script

Paper-Replication-2012 / Build-Input-Files-for-GreenGenes-Taxonomy / main.sh

and have one question regarding fasta files for Ray Taxonomy and Communities

I have notices that for the NCBI taxonomy the script Paper-Replication-2012 / Build-Input-Files-for-NCBI-Taxonomy / CreateRayInputStructures.sh

Creates a single fasta file with for each genome. My question is whether those reference fasta files are just a concatenation of all .fna files associated with anty given genome. (and so there are multiples IDs and accessions associated with a given "genome". This becomes an is an issue for draft genomes (lots of scaffolds) or eukaryotic chromosomes, which I will have to "manually merge"

Actually after I double checked the CreateRayInputStructures.sh script it seems to be the case, but would you please confirm it?

Marcelino


All times are GMT -8. The time now is 10:09 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.