SEQanswers

Go Back   SEQanswers > Literature Watch



Similar Threads
Thread Thread Starter Forum Replies Last Post
meta-velvet returns nodes instead of contigs in assembly? deprekate Bioinformatics 2 10-25-2012 02:53 PM
De Novo Assembly using Ray Farhat De novo discovery 18 05-23-2012 02:19 PM
Meta assembly Autotroph Metagenomics 1 04-05-2012 01:32 PM
PubMed: Ray: simultaneous assembly of reads from a mix of high-throughput sequencing Newsbot! Literature Watch 0 03-01-2011 11:30 AM

Reply
 
Thread Tools
Old 04-09-2013, 05:49 PM   #21
seb567
Senior Member
 
Location: Québec, Canada

Join Date: Jul 2008
Posts: 260
Default

Quote:
Originally Posted by suzumar View Post
Hi Sebastian and thanks for developing Ray. I am working on a sponge metagenome (ion torrent) and I am trying to setup ray for taxonomy and communities.

I an trying to setup the files for the latest version of greengenes (2012_08) and have parsed the information in the fasta file to the same format as 2011_01, and I am trying to manually run the script
It's good to know that there is a new release.

Quote:
Paper-Replication-2012 / Build-Input-Files-for-GreenGenes-Taxonomy / main.sh

and have one question regarding fasta files for Ray Taxonomy and Communities

I have notices that for the NCBI taxonomy the script Paper-Replication-2012 / Build-Input-Files-for-NCBI-Taxonomy / CreateRayInputStructures.sh

Creates a single fasta file with for each genome. My question is whether those reference fasta files are just a concatenation of all .fna files associated with anty given genome. (and so there are multiples IDs and accessions associated with a given "genome".
Yes. drafts can have several .fna files that can be concatenated.

Quote:

This becomes an is an issue for draft genomes (lots of scaffolds) or eukaryotic chromosomes, which I will have to "manually merge"
You can do a bash command line to do that for you. Something like

Code:
mkdir merged

for draft in $(ls drafts)
do
    cat drafts/$draft/*.fna > merged/$draft.fasta
done
Quote:
Actually after I double checked the CreateRayInputStructures.sh script it seems to be the case, but would you please confirm it?

Marcelino
Yes. this the code below does what you said:

Code:
if test ! -d NCBI-Finished-Bacterial-Genomes
then
        echo "Creating $OutputDirectory/NCBI-Finished-Bacterial-Genomes, please wait."

        mkdir NCBI-Finished-Bacterial-Genomes
        cd NCBI-Finished-Bacterial-Genomes

        for i in $(ls ../uncompressed/all.fna)
        do
                name=$(echo $i|sed 's/_uid/ /g'|awk '{print $1}')

                cat ../uncompressed/all.fna/$i/*.fna > $name".fasta"
        done

        echo "Done."

        cd ..
fi
seb567 is offline   Reply With Quote
Old 04-17-2013, 07:58 AM   #22
seb567
Senior Member
 
Location: Québec, Canada

Join Date: Jul 2008
Posts: 260
Default Ray v2.2.0 is now available.

Hello,

Ray v2.2.0 is now available worldwide.

The delay between v2.1.0 and v2.2.0 was quite huge.

Ray v2.2.0 brings a lot of bug fixes and some new features.

The tarball is available at:

http://sourceforge.net/projects/deno...v2.2.0.tar.bz2





The most significant changes include:

* SequencesLoader: the Illumina export format is now supported
* add build option for MPI I/O
* void infinite loops during read recycling
* messages must not be passed by value
* Fixed a linking error caused by ordering
* FusionTaskCreator: don't lose genomic regions during merging
* new file GraphPartition.txt shows the distribution of objects
* readahead operations are used for reading gz files
* core: fixed a race condition occurring with -route-messages
* SeedingData: fix regression for seed checkpointing
* all the code of Ray was ported to this new GraphPath framework

The GraphPath framework reduces the memory usage and avoid some misassembly
errors by enforcing the Bruijn graph property.

* Scaffolder: don't fetch reads from repeated objects

This fixes running time issues on large genomes with repeats.

* SeedingData: implemented a staggered mean algorithm

* Mock: removed the limit on the number of input files
* Library: implemented checkpointing for paired reads
* removed all calls to fflush(stdout) and cout.flush()
* SeedExtender: reduce the verbosity of graph traversal
* reduced the amount of information in the standard output
* JoinerTaskCreator: reduced the default verbosity
* KmerAcademyBuilder: reduced the verbosity for graph construction
* implemented an adaptive Bloom filter
* store a path as a sequence instead of a vector of vertices for efficiency
* SequencesLoader: add support for short file names





All changes in Ray between v2.1.0 and v2.2.0

Charles Joly Beauparlant (1):
Added an example plugin.

Sébastien Boisvert (160):
Some work around the minirank model.
Ported Ray plugins to the mini-ranks RayPlatform.
Ray plugins were ported to the mini-ranks.
Moved the destruction of allocators in RayPlatform.
I ported Ray to some changes in some classes in RayPlatform.
application_core: the application code was simplified
Social networks were added to the release procedure
Code names of old releases were added
Fixed a linking error caused by ordering
Fixed the scope of options in build system
The build system was simplified
AR and LD are not needed here
Ray must abort if the output directory exists
The RayCommand.txt file was fixed for mini-ranks
Added the name of each rank (or mini-rank) in network test
The subgraph must be built regardless if it will be used
Merge branch 'minirank-model' of git://github.com/sebhtml/ray.git
core: CONFIG_* variables are private
core: The option -mini-rank-per-rank was added
ship: removed 6 files in shipped products
core: don't return parameters by value
Mock: new plugin called that does nothing
SequencesLoader: a regression for .bz2 file support was fixed
messages must not be passed by value
Ordered all headers
Updated copyrights
Documentation: there is only one repository for research tools
reverted a wrong hunk from commit 7c361f1530d084c6f99
FusionTaskCreator: don't lose genomic regions during merging
SeedExtender: properly format extension file name
Scaffolder: only put one new line after scaffold sequence
KmerAcademyBuilder: use vertexRank() to find who owns an object
new file GraphPartition.txt shows the distribution of objects
the line that shows the process identifier was moved
CoverageGatherer: kmers.txt should have 1 header only
recursive make was improved
readahead operations are used for reading gz files
SequencesLoader: added the rank number when loading files
core: the partitioner needs the correct rank number
core: fixed a race condition occurring with -route-messages
SeedExtender: display the number of traversed nucleotide symbols
Seeds: new runtime metrics for seeding algorithms
new header for SeedLengthDistribution.txt
new header for any paired read file LibraryN.txt
SequencesLoader: added a few assertions for read partitions
new header for CoverageDistribution.txt
Merge branch 'master' of github.com:sebhtml/ray
Documentation: added the polytope with 4225 vertices
SeedingData: fix regression for seed checkpointing
added documentation for using the torus
Documentation: added arguments for a 5D torus with 1024 vertices
Documentation: fixed permissions
removed the output file called MessagePassingInterface.txt
renamed the AssemblySeed to GraphPath so it can be reused
all the code of Ray was ported to this new GraphPath framework
Documentation: fixed the degree of the polytope
Scaffolder: don't fetch reads from repeated objects
SeedExtender: added documentation in the code for repeated vertices
fixed a couple of compilation warnings
SeedingData: implemented a staggered mean algorithm
Scaffolder: replaced getMode() by the new GraphPath framework
Mock: removed the limit on the number of input files
remove the limitation regarding the maximum number of files
moved message handlers from MessageProcessor to SequencesLoader
Scaffolder: fixed 2 compilation warnings
Library: implemented checkpointing for paired reads
SeedingData: reduced amount of printed information
removed all calls to fflush(stdout) and cout.flush()
SeedExtender: reduce the verbosity of graph traversal
reduced the amount of information in the standard output
JoinerTaskCreator: reduced the default verbosity
KmerAcademyBuilder: reduced the verbosity for graph construction
SequencesLoader: reduced verbosity
VerticesExtractor: reduced verbosity
reduced verbosity
reduced verbosity
SequencesLoader: the Illumina export format is now supported
added a loader interface for file formats
SequencesLoader: all supported formats use the interface
SequencesLoader: implemented a product factory
Mock: updated documentation for new export format
Mock: output a single file for library data
implemented an adaptive Bloom filter
improved the interface of path objects
add debug symbols by default
store a path as a sequence instead of a vector of vertices for efficiency
Mock: the path storage using blocks is not ready
SeedingData: enforce de Bruijn graph property for path storage
SeedingData: use the GraphPath storage code to compute seeds
SeedingData: refactor code so that m_content is abstracted
SeedingData: use 2-bit encoding for paths
SeedingData: plugin options are parsed by plugins
use constants for symbols
SeedingData: correctly detect dead ends
add more information for coding style
MachineHelper: registerPlugin and resolveSymbols must be last
SeedingData: tips can not be seeds
SequencesLoader: add support for short file names
SeedingData: tips are not valid seeds
move some handlers in the Scaffolder plugin
Scaffolder: implement the handler for packed chunks
fix a race condition during directory probing
reduce verbosity of components
add documentation for building on IBM Blue Gene/Q
add code name for upcoming release
SequencesLoader: fix regression (added in ca979832) for line widths
add plugin PathEvaluator to evaluate paths
PathEvaluator: write ContigPaths checkpoints in parallel
reserve storage capacity for sequence file
perform parallel I/O operations
fix a bug when disabling scaffolding
use MPI I/O to write Contigs.fasta
use a file view for each MPI rank
add build option for MPI I/O
avoid parallel I/O without MPI I/O
avoid infinite loops during read recycling
update polytope documentation
add comments for old class
add a new plugin to process spurious seeds
port some plugins to the simplified RayPlatform API
iterate on seeds to filter them
register seed paths in the distributed graph
hide hash values for Bloom filter
push the workflow in a helper class
fetch ancestors of seed heads
seed lengths must be collected after analysis
write seed statistics after analysis
write seed checkpoints after the quality control analysis
write seed files after analysis (-write-seeds)
skip seed quality analysis if checkpoints exist
add steps for better dead end detection
hide mini-ranks in help if they are disabled
correct a bunch of bugs for adapters in Ray
reuse code paths to obtain sequence information
eliminate seeds that have a dead-end on the left
discard seeds with dead-ends on the right
increase the maximum depth for searches
add a class to fetch the attributes of a DNA sequence
create a class to fetch annotations in a portable way
fetch nearby paths to detect bubbles
fix a bug during the registration of seeds
remove any seed that is a weak part of a bubble
add 4 methods that will be implemented later
fix a regression that prevented the closing of a file
add new reference in the output
disable the seed filter when using short kmers
add a maximum coverage depth for dead end search
adapt the allowed depth in function of the data
add design blueprints for the new plugin
SpuriousSeedAnnihilator: disable debug messages by default
TaxonomyViewer: rename the plugin to TaxonomyViewer
remove plugin_ from all plugin directory names
add new line for publications
application_core: fix buggy message routing
SeedExtender: don't traverse path if it's consumed already
SeedingData: fix a bug for the phix system test
update the CMakeList.txt
use git to store version names
Disable the filtering code during the computation of seeds
This is Ray v2.2.0




All changes in RayPlatform between v1.1.0 and v1.1.1


Sébastien Boisvert (56):
initial work on miniranks with VirtualMachine and Minirank
I added some design documentation for mini-ranks.
spinlocks are more suitable for this job
added design documentation for mini-ranks.
First implementation of mini-ranks in RayPlatform
The core must provide the mini-rank number.
Documentation: added description of macros.
Fixed some bugs in the mini-ranks model.
Moved the destruction of allocators in the core.
Mini-rank source and mini-rank destination are required.
The desctructor of the middleware must be called.
A mini-rank must tell the rank that it has messages to send.
The class MessageQueue does the job of receiving messages.
Non-blocking queues will be used for the communication.
The non-blocking message queue for mini-ranks is ready.
MPI_Recv must be called to get the mini-rank numbers.
This is the branch for RayPlatform v7.0.0.
core: The old behavior (no mini-ranks) now works as expected
core: RayPlatform is responsible for creating mini-ranks
The old adapter API documentation was removed
Message reception is now interleaved with send operations.
More buffers are needed for mini-ranks
communication: don't register already registered buffers
The build system is less verbose
New API call to get the number of mini-ranks per rank
Added a method to get the MessagesHandler object
Merge branch 'minirank-model' of github.com:sebhtml/RayPlatform into minirank-model
Merge branch 'minirank-model' of git://github.com/sebhtml/RayPlatform.git
handlers: new option to cache operation codes
communication: messages must be passed with a pointer
Ordered headers in all files
Updated copyrights
The short name was updated in headers
The website was updated in every file
a retry is necessary when a message is pushed into a full ring
Documentation: updated RayPlatform mini-ranks blueprints
communication: moved writeFiles() in a second method
communication: removed a few debugging instructions
Documentation: added gate blueprints
Documentation: improved design for non-linear scheduling
routing: renamed the hypercube to polytope
Documentation: added Torus description
a radix of 2 produces a hypercube
use the Q and ASSERT build arguments in RayPlatform
routing: implemented a new communication graph: the torus
Merge branch 'master' of github.com:sebhtml/RayPlatform
core: use specific code to get memory usage on Blue Gene/Q
the next release will likely be 1.2.0 and not 7.0.0
add option to provide public access to a master mode
add the core in each plugin
add two macros to configure handlers
fixed directives to compile mini-ranks
core: fix buggy message routing
improve the patch for message routing with a configuration
core: fix a regression for registered handle names
This is RayPlatform v1.1.0.
seb567 is offline   Reply With Quote
Old 04-18-2013, 09:52 AM   #23
vpp605
Junior Member
 
Location: Saskatchewan

Join Date: Feb 2009
Posts: 6
Default Ray Meta - taxonomy and gene ontology

Hello,

I have a general question about the way Ray Meta works. When the taxonomy and gene ontology profiles are provided, I assume that is only for the assembled contigs? Or would that also include results for reads that were not assembled into contigs?
We have fairly low read coverage for our samples, so I anticipate a large portion of the reads will not be assembled into contigs. As such, is there a way to get the taxonomy and gene ontology profiles of the entire data set (i.e., contigs and any reads that were not assembled into contigs)?

Thanks in advance for your response, and complements to your group on the paper describing Ray Meta. The thorough documentation and supplementary material is very much appreciated!
vpp605 is offline   Reply With Quote
Old 04-29-2013, 08:48 AM   #24
seb567
Senior Member
 
Location: Québec, Canada

Join Date: Jul 2008
Posts: 260
Default

Quote:
Originally Posted by vpp605 View Post
Hello,

I have a general question about the way Ray Meta works. When the taxonomy and gene ontology profiles are provided, I assume that is only for the assembled contigs? Or would that also include results for reads that were not assembled into contigs?
We have fairly low read coverage for our samples, so I anticipate a large portion of the reads will not be assembled into contigs. As such, is there a way to get the taxonomy and gene ontology profiles of the entire data set (i.e., contigs and any reads that were not assembled into contigs)?

Thanks in advance for your response, and complements to your group on the paper describing Ray Meta. The thorough documentation and supplementary material is very much appreciated!
As explained in the paper, the profiling is done on de Bruijn subgraph.

So you should be OK with your low coverage data I think.
seb567 is offline   Reply With Quote
Old 06-12-2013, 09:11 AM   #25
jjjscuedu
Member
 
Location: NY

Join Date: Mar 2012
Posts: 35
Default ray moi running problem

Dear all,

I am trying to run Ray on a test data.

However, when I try to run it according to the mannual, there is an error like this:


[jingjing@tll-bioinfo02 Ray-v2.2.0]$ mpiexec -n 1 ray-build/Ray -o test -p PE1.fa PE2.fa -k 31
ssh: Could not resolve hostname tll-bioinfo02: Name or service not known

It seems the mpi mode is something wrong.

Can anyone give me some suggestions?

Thanks!
jjjscuedu is offline   Reply With Quote
Old 06-12-2013, 10:19 AM   #26
seb567
Senior Member
 
Location: Québec, Canada

Join Date: Jul 2008
Posts: 260
Default

Quote:
Originally Posted by jjjscuedu View Post
Dear all,

I am trying to run Ray on a test data.

However, when I try to run it according to the mannual, there is an error like this:


[jingjing@tll-bioinfo02 Ray-v2.2.0]$ mpiexec -n 1 ray-build/Ray -o test -p PE1.fa PE2.fa -k 31
ssh: Could not resolve hostname tll-bioinfo02: Name or service not known

It seems the mpi mode is something wrong.

Can anyone give me some suggestions?

Thanks!
The problem is related to your MPI installation. It seems that mpiexec wants to use ssh to connect to tll-bioinfo02.

Maybe tll-bioinfo02 is listed in a hostfile and that your MPI installation is using that.

Try:

mpiexec -n 1 -host localhost ray-build/Ray -o test -p PE1.fasta PE2.fasta -k 31
seb567 is offline   Reply With Quote
Old 10-31-2013, 01:40 PM   #27
seb567
Senior Member
 
Location: Québec, Canada

Join Date: Jul 2008
Posts: 260
Default Ray 2.3.0

Hello,

I am proud to announce the immediate availability of Ray 2.3.0 (434 KB).



Most significant change:

- add -detect-sequence-files to detect supported files

With this option, you just need to put your sequence data files in one directory,
and use "mpiexec -n 99 Ray -detect-sequence-files directory. This option will match
paired files and everything.

What's new:

- new option "-run-surveyor" to compare several samples (see Documentation/)
- support long reads in -amos option (reported by Bastian Hornung @ wur.nl)
- Scaffolder: fix a bug in the formatting of scaffolds (Rob Egan @ Lawrence Berkeley Laboratory)
- ElapsedTime.txt is now in tabular format (suggested by James Vincent @ The University of Vermont)
- add new sequence file extensions such as .fq.gz (see the manual)
- fix a interger overflow for the Bloom filter (thanks to Chien-Chi Lo)
- remove the symbolic loop in RayPlatform (reported by Nick Holway)
- add the ability to send SIGUSR1 to Ray processes to debug them
- use the polytope by default with option -route-messages (instead of the de Bruijn graph)


Download link:

http://master.dl.sourceforge.net/pro...-2.3.0.tar.bz2

Mirrors:

https://github.com/sebhtml/Ray-Relea...-2.3.0.tar.bz2
https://bitbucket.org/sebhtml/ray-re...-2.3.0.tar.bz2



Thanks !

Sébastien

-----

Paper for Ray Meta for metagenomics:
http://genomebiology.com/2012/13/12/R122

Ticket for the release:
https://github.com/sebhtml/ray/issues/194
seb567 is offline   Reply With Quote
Old 03-04-2014, 01:26 AM   #28
canderson30
Junior Member
 
Location: Nebraska

Join Date: Nov 2012
Posts: 2
Default

Hello,

When I look through some outputs generated from the amos file following assembly, many of the contigs were assigned 0 reads (used default bank2contig after seeing many contigs were not showing up in the generated sam file). Obviously, this does not make much sense, but I was wondering if anyone else has came across this? I was trying to avoid mapping by using the amos file and now I just want to confirm that the contigs I am getting are 'real' I suppose.

I thought this may be due to read recycling at first, but reads show up under multiple contigs still. Anyone have other ideas what is causing this issue or how to correct it during assembly?


Chris
canderson30 is offline   Reply With Quote
Old 07-22-2014, 03:27 AM   #29
amitbik
Member
 
Location: Italy

Join Date: May 2013
Posts: 50
Default Abyss & ray

Hi, Everyone

I am doing metagenomic shotgun assembly in Abyss and Ray i got the result but i wanted to know how many reads are used and unused?
I tried but failed. can anyone just guide me to find out?

Any help will be appreciated...
amitbik is offline   Reply With Quote
Old 07-23-2014, 02:06 AM   #30
amitbik
Member
 
Location: Italy

Join Date: May 2013
Posts: 50
Default

Hi, All

Please help me guys...
amitbik is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:03 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO