SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
parallel de novo assembler tmy1018 Bioinformatics 3 10-22-2012 09:31 AM
PubMed: A Comparison of Parallel Pyrosequencing and Sanger Clone-Based Sequencing and Newsbot! Literature Watch 0 11-01-2011 03:00 AM
Contrail - a hadoop-based de novo sequence assembler samanta General 0 09-08-2011 12:16 PM
looking for reference genome based assembler for short-reads zchou Bioinformatics 3 12-16-2009 09:13 PM
PubMed: ABySS: A parallel assembler for short read sequence data. Newsbot! Literature Watch 0 03-03-2009 06:00 AM

Reply
 
Thread Tools
Old 03-11-2013, 06:30 AM   #221
seb567
Senior Member
 
Location: Québec, Canada

Join Date: Jul 2008
Posts: 260
Default

Quote:
Originally Posted by yaximik View Post
Hi,

I tried to run Ray (maxkmer 32) on 2 x quad core RHEl58 with hyper-threading enabled:


mpiexec -n 16 Ray <Ray.conf> and got the error:
Code:
........
Loader::load] File: /media/FantomHD/Data/MiSeq/SC/AdQ30/SC-MILLib1-Herc2s10cFr1Fr2run2R1AdQ30.fastq (please wait...)
[Loader::load] File: /media/FantomHD/Data/MiSeq/SC/AdQ30/SC-MILLib1-Herc2s10cFr1Fr2run2R1AdQ30.fastq (please wait...)
[Loader::load] File: /media/FantomHD/Data/MiSeq/SC/AdQ30/SCPfx3s25cFr3-150-200run1R1AdQ30.fastq (please wait...)
[Loader::load] File: /media/FantomHD/Data/MiSeq/SC/AdQ30/SCPfx3s25cFr3-150-200run1R1AdQ30.fastq (please wait...)
[Loader::load] File: /media/FantomHD/Data/MiSeq/SC/AdQ30/SCPfx3s25cFr3-150-200run2R1AdQ30.fastq (please wait...)
[Loader::load] File: /media/FantomHD/Data/MiSeq/SC/AdQ30/SCPfx3s25cFr3-150-200run2R1AdQ30.fastq (please wait...)
[Loader::load] File: /media/FantomHD/AssRefMap/SC/SCold/SColdAll.fasta (please wait...)
[Loader::load] File: /media/FantomHD/AssRefMap/SC/SCold/SColdAll.fasta (please wait...)
[Loader::load] File: /media/FantomHD/AssRefMap/SC/SCold/SCallSanger.fasta (please wait...)
[Loader::load] File: /media/FantomHD/AssRefMap/SC/SCold/SCallSanger.fasta (please wait...)
[Loader::load] File: /home/yaximik/AssRefMap/SC/minia/SCMiSeqAllFGMGPGIGclean_k27.contigs.fasta (please wait...)
[G5NNJN1:07040] *** Process received signal ***
[G5NNJN1:07040] Signal: Segmentation fault (11)
[G5NNJN1:07040] Signal code:  (128)
[G5NNJN1:07040] Failing at address: (nil)
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 7040 on node G5NNJN1 exited on signal 11 (Segmentation fault).
The last file loaded was a file with fasta contigs from another assembler (minia). Does this mean contigs from other assemblers cannot be used in Ray?
The maximum read length is 65536 nucleotides.
seb567 is offline   Reply With Quote
Old 03-11-2013, 06:32 AM   #222
seb567
Senior Member
 
Location: Québec, Canada

Join Date: Jul 2008
Posts: 260
Default

Quote:
Originally Posted by KirillK View Post
Hi guys!

Is there a way to provide a reference genome for Ray?

cheers,
KK
You can provide reference genomes using the -search option.

Code:
       -search searchDirectory
              Provides a directory containing fasta files to be searched in the de Bruijn graph.
              Biological abundances will be written to RayOutput/BiologicalAbundances
              See Documentation/BiologicalAbundances.txt
However, this will not be used to aid in the assembly. This option is useful to report biological abundances.

See this paper for more information.

Last edited by seb567; 03-11-2013 at 06:39 AM. Reason: added Genome Biology reference
seb567 is offline   Reply With Quote
Old 03-11-2013, 06:37 AM   #223
seb567
Senior Member
 
Location: Québec, Canada

Join Date: Jul 2008
Posts: 260
Default

Quote:
Originally Posted by yaximik View Post
Hi,
What is the meaning of averageOuterDistance and standardDeviation for paired end files?
The outer distance is the sum of the gap size, the length of the left read and the length of the right read.

This is computed for paired reads and mate pairs.

Quote:

Is it just average read length in the dataset?
No.

Quote:
If so, then why it is not required for single read file?
It only applies for pairs.

Quote:

If not, is it an average fragment length in the library?
Yes.

Quote:
Such as surmised from BioAnalyzer trace, for example?
Yes, but the BioAnalyzer will also include sequencing adapters in the evaluation whereas these are not included in sequencing reads usually.

Quote:

If so, then default autocalc may give very wrong estimate, could it? For example, one of my paired read runs was done with a library of 600 bp +/- 15%, but during assembly autocalc estimate was something 150 bp - how this can be so much off?
The 600 bp +/- 15% presumably includes adapters that are not in sequencing reads.

You can run another application on your data (like ABySS) and you'll see that Ray's right.
seb567 is offline   Reply With Quote
Old 03-11-2013, 07:11 AM   #224
yaximik
Senior Member
 
Location: Oregon

Join Date: Apr 2011
Posts: 205
Default

Quote:
The maximum read length is 65536 nucleotides.
Got to be another reason. The assembly file by minia includes max contig of 16091 nt. Without this dataset, Ray produced assembly with max contig/scaffold of 46428 nt.

Quote:
The 600 bp +/- 15% presumably includes adapters that are not in sequencing reads.
That is puzzling. The combined adaptor length (both sides) is standard at 120 bp, so autocalc is then a way off (600-120=480, but estimated is ~150). Obviously much smaller library size should affect scaffolding. Would that be better to provide real numbers? Also, i guess the narrower distribution should be better, correct? This can be done by refractionation of the library and collecting narrow distribution, say +/-5%.
yaximik is offline   Reply With Quote
Old 03-11-2013, 07:17 AM   #225
seb567
Senior Member
 
Location: Québec, Canada

Join Date: Jul 2008
Posts: 260
Default

Quote:
Originally Posted by yaximik View Post
Got to be another reason. The assembly file by minia includes max contig of 16091 nt. Without this dataset, Ray produced assembly with max contig/scaffold of 46428 nt.
Then the problem is presumably caused by the lack of support for multiline fasta files for reads in Ray.

Please do submit a ticket if you feel this should be fixed.

Quote:

That is puzzling. The combined adaptor length (both sides) is standard at 120 bp, so autocalc is then a way off (600-120=480, but estimated is ~150). Obviously much smaller library size should affect scaffolding. Would that be better to provide real numbers? Also, i guess the narrower distribution should be better, correct? This can be done by refractionation of the library and collecting narrow distribution, say +/-5%.
You can plot your distributions.

LibraryStatistics.txt contains averages, but you have all the signal in Library0.txt, Library1.txt. If you are using the git version of Ray, this information is now in LibraryData.xml
seb567 is offline   Reply With Quote
Old 03-11-2013, 01:09 PM   #226
yaximik
Senior Member
 
Location: Oregon

Join Date: Apr 2011
Posts: 205
Default

Tried to view AMOS.afg file (37.1 GB) using a couple of programs. Tablet is painfully slow, but it eventually quit reporting error in some line. Hawkeye (AMOS package) successfully imported assembly in bank. and even opened graphic window showing contig 1, but then hung forever and has to be killed.
Code:
[yaximik@G5NNJN1 ~]$ hawkeye
START DATE: Mon Mar 11 11:06:54 2013
Bank is: /home/yaximik/AssRefMap/SC/Ray/RayOutput/AMOS.afg.bnk
    0%                                            100%
AFG ..................................................
Messages read: 175403161
Objects added: 175403161
Objects deleted: 0
Objects replaced: 0
END DATE:   Mon Mar 11 12:13:09 2013
Opening /home/yaximik/AssRefMap/SC/Ray/RayOutput/AMOS.afg.bnk... [160.12s]
Indexing Contigs   .......... [83.11s] 107326772 reads in 1409913 contigs
Scaffold information not available
Mates not available:WHAT: Could not open bank file, /home/yaximik/AssRefMap/SC/Ray/RayOutput/AMOS.afg.bnk/FRG.ifo, No such file or directory
LINE: 1264
FILE: Bank_AMOS.cc

Features not available
Initialize Display .Loading AssemblyStats...[8.95s]
.Loading Features...      [0.01s]
.Loading Libraries...     [0.00s]
.Loading Scaffolds....Loading Contigs...       [186.21s]
....Loading NCharts...       [21.83s]
. [217.01s]
Loading Contig 1... [0.05s] 109076 reads
Loading reads...         [343.52s]
Total Load Time: [803.92s]
Loading mates ..................................................
inserts: 108933 mated: 0 matelisted: 0 unmated: 108933 happy: 0 unhappy: 0
Paint: coverage contigs insetcovfeat readcovfeat features inserts
width: 12457 swidth: 778 height: 26357..
Killed
[yaximik@G5NNJN1 ~]$
What viewer can be used to view assembly?
yaximik is offline   Reply With Quote
Old 03-12-2013, 08:41 AM   #227
yaximik
Senior Member
 
Location: Oregon

Join Date: Apr 2011
Posts: 205
Default

Quote:
Quote:
If not, is it an average fragment length in the library?

Yes.

Quote:
Such as surmised from BioAnalyzer trace, for example?

Yes, but the BioAnalyzer will also include sequencing adapters in the evaluation whereas these are not included in sequencing reads usually.
How the average length is calculated? I guess after reads are aligned to assembly, correct? But I thought that assembly depends on paired end infomation, so unless I am wrong one has a logical short circuit here - paired reads are distanced based on assembly, which depends on distance between paired reads.
It is not that I am maliciously after how algorithm was designed. I am trying to guess where such discrepancy between Bioanalyzer and assembler is coming from. Could it be that Bioanalyzer traces for libraries are so misleading, so I have really no idea about size of libraries I am sequencing? Or autocalc is misled somehow in library size estimation?
yaximik is offline   Reply With Quote
Old 03-14-2013, 06:50 AM   #228
seb567
Senior Member
 
Location: Québec, Canada

Join Date: Jul 2008
Posts: 260
Default

Quote:
Originally Posted by yaximik View Post
Tried to view AMOS.afg file (37.1 GB) using a couple of programs. Tablet is painfully slow, but it eventually quit reporting error in some line. Hawkeye (AMOS package) successfully imported assembly in bank. and even opened graphic window showing contig 1, but then hung forever and has to be killed.
Code:
[yaximik@G5NNJN1 ~]$ hawkeye
START DATE: Mon Mar 11 11:06:54 2013
Bank is: /home/yaximik/AssRefMap/SC/Ray/RayOutput/AMOS.afg.bnk
    0%                                            100%
AFG ..................................................
Messages read: 175403161
Objects added: 175403161
Objects deleted: 0
Objects replaced: 0
END DATE:   Mon Mar 11 12:13:09 2013
Opening /home/yaximik/AssRefMap/SC/Ray/RayOutput/AMOS.afg.bnk... [160.12s]
Indexing Contigs   .......... [83.11s] 107326772 reads in 1409913 contigs
Scaffold information not available
Mates not available:WHAT: Could not open bank file, /home/yaximik/AssRefMap/SC/Ray/RayOutput/AMOS.afg.bnk/FRG.ifo, No such file or directory
LINE: 1264
FILE: Bank_AMOS.cc

Features not available
Initialize Display .Loading AssemblyStats...[8.95s]
.Loading Features...      [0.01s]
.Loading Libraries...     [0.00s]
.Loading Scaffolds....Loading Contigs...       [186.21s]
....Loading NCharts...       [21.83s]
. [217.01s]
Loading Contig 1... [0.05s] 109076 reads
Loading reads...         [343.52s]
Total Load Time: [803.92s]
Loading mates ..................................................
inserts: 108933 mated: 0 matelisted: 0 unmated: 108933 happy: 0 unhappy: 0
Paint: coverage contigs insetcovfeat readcovfeat features inserts
width: 12457 swidth: 778 height: 26357..
Killed
[yaximik@G5NNJN1 ~]$
What viewer can be used to view assembly?
When the AMOS file format was implemented, I tested Hawkeye, Tablet, and Bank-transact.

You can submit a ticket and I will eventually look at that, but this feature has not really changed since it was implemented.

What is the hardware (memory, processor, video card) on which you are running Hawkeye ?

For visualization, I am working on Ray Cloud Browser.
seb567 is offline   Reply With Quote
Old 03-14-2013, 06:59 AM   #229
seb567
Senior Member
 
Location: Québec, Canada

Join Date: Jul 2008
Posts: 260
Default

Quote:
Originally Posted by yaximik View Post
How the average length is calculated?
I guess after reads are aligned to assembly, correct?
Yes, but all of this happens in the de Bruijn graph -- there is no aligner in the process.

Quote:

But I thought that assembly depends on paired end infomation, so unless I am wrong one has a logical short circuit here - paired reads are distanced based on assembly, which depends on distance between paired reads.
Yes, it's like a bootstrapping process: distances are sampled from seeds (similar to unitigs), and then the empirical distribution is used to extend longer contigs by matching paired reads to the distribution.

I like your short circuit.

*
Quote:

It is not that I am maliciously after how algorithm was designed.
On the contrary, science advances when curious people step in.

Quote:
I am trying to guess where such discrepancy between Bioanalyzer and assembler is coming from. Could it be that Bioanalyzer traces for libraries are so misleading, so I have really no idea about size of libraries I am sequencing?
One hypothesis is that the population of molecules analyzed by the Bioanalyzer is a superset of the molecules that are present on the sequencing flow cell after library preparation.

Quote:
Or autocalc is misled somehow in library size estimation?
That may be the case too, but I would be surprised by that.
seb567 is offline   Reply With Quote
Old 03-14-2013, 09:12 AM   #230
yaximik
Senior Member
 
Location: Oregon

Join Date: Apr 2011
Posts: 205
Default

Quote:
When the AMOS file format was implemented, I tested Hawkeye, Tablet, and Bank-transact.
You can submit a ticket and I will eventually look at that, but this feature has not really changed since it was implemented.
What is the hardware (memory, processor, video card) on which you are running Hawkeye ?
Its two quad core Xeon E5620 with 96GB memory and nVIDIA NV 300 in the double display mode.
yaximik is offline   Reply With Quote
Old 03-14-2013, 10:36 AM   #231
seb567
Senior Member
 
Location: Québec, Canada

Join Date: Jul 2008
Posts: 260
Default

Quote:
Originally Posted by yaximik View Post
Its two quad core Xeon E5620 with 96GB memory and nVIDIA NV 300 in the double display mode.
Therefore, hardware should not be a problem with such a nice computer.

Is your user experience with Hawkeye or Tablet problematic only with AMOS files generated by Ray or the issue is also occurring with AMOS files generated by other tools ?
seb567 is offline   Reply With Quote
Old 03-19-2013, 11:18 AM   #232
yifangt
Member
 
Location: Canada

Join Date: Feb 2011
Posts: 61
Default maximum kmer length?

Hello Sebastien:

Can I ask what is the maximum value that I can set MAXKMERLENGTH for Ray 2.1? So far I can see the largest value ever set is 128, and for Velvet I ever saw is 151. Somewhere I saw "There is an arbitary number that can be set for MAXKMERLENGTH", but could not find the link anymore. Can I confirm the max value for Ray I can set? Thanks a lot!

YT
yifangt is offline   Reply With Quote
Old 03-19-2013, 02:03 PM   #233
seb567
Senior Member
 
Location: Québec, Canada

Join Date: Jul 2008
Posts: 260
Default

Quote:
Originally Posted by yifangt View Post
Hello Sebastien:

Can I ask what is the maximum value that I can set MAXKMERLENGTH for Ray 2.1? So far I can see the largest value ever set is 128, and for Velvet I ever saw is 151. Somewhere I saw "There is an arbitary number that can be set for MAXKMERLENGTH", but could not find the link anymore. Can I confirm the max value for Ray I can set? Thanks a lot!

YT
A message in Ray has a maximum size of 4000 bytes and 2 bits are necessary per nucleotide. The maximum is therefore 4000 / 2 = 2000.

However, read lengths and sequencing errors will be limiting factors here.
seb567 is offline   Reply With Quote
Old 03-27-2013, 12:57 PM   #234
yifangt
Member
 
Location: Canada

Join Date: Feb 2011
Posts: 61
Default assembly output format for visualisation

Another question!
I want to view my assembly with other software e.g. Tablets, Mauve or OSlay, etc. Is there any way in Ray to convert the output files to ACE, MAQ, SAM or BAM format for those post-assembly programs?
In the FAQ section of your site there is question about the AMOS format for the output, but I did not do that. Do I have to run the assembly again and have the -amos option on? But unfortunately the AMOS format is not universal for other programs to read.
I was trying to figure out what the output files are about, but not sure which one I should use for those visualization programs, or which one should be used for perl/shell script for the format converstion.
Appreciate if you could give me any clue.

Thanks!

YT
yifangt is offline   Reply With Quote
Old 03-27-2013, 01:09 PM   #235
seb567
Senior Member
 
Location: Québec, Canada

Join Date: Jul 2008
Posts: 260
Default

Quote:
Originally Posted by yifangt View Post
Another question!
I want to view my assembly with other software e.g. Tablets, Mauve or OSlay, etc. Is there any way in Ray to convert the output files to ACE, MAQ, SAM or BAM format for those post-assembly programs?
The two current options are:

1. use -amos, then use a amos-compatible viewer

2. use -write-kmers, then use Ray Cloud Browser

Quote:
Originally Posted by yifangt View Post

In the FAQ section of your site there is question about the AMOS format for the output, but I did not do that. Do I have to run the assembly again and have the -amos option on?

Yes, you need to run it again.

Quote:
Originally Posted by yifangt View Post

But unfortunately the AMOS format is not universal for other programs to read.

There are two formats for de novo assemblies: amos and fastg. The amos format is supported by far more applications.

Quote:
Originally Posted by yifangt View Post

I was trying to figure out what the output files are about, but not sure which one I should use for those visualization programs, or which one should be used for perl/shell script for the format converstion.
There are Contigs.fasta and Scaffolds.fasta.

One thing you can do is to map your fastq sequences on the contigs and use, for example, "samtools tview" to visualize that.


Another way is to run Ray with -write-kmers and to use Ray Cloud Browser, which is probably the most-interactive web genome viewer you'll find out there.

Quote:
Originally Posted by yifangt View Post

Appreciate if you could give me any clue.

Thanks!

YT
seb567 is offline   Reply With Quote
Old 03-30-2013, 07:34 PM   #236
yifangt
Member
 
Location: Canada

Join Date: Feb 2011
Posts: 61
Default How to include the mate-pair information in Ray

Hi Sebastien:
What is the option to include the different mate-pairs information for assembly?
After searching this forum I handled this mixture of reads by pretending there are multiple paired end reads as:
Code:
mpiexec -n 20 Ray -k 53 -p S01_clean_PE_R1.fasta S01_clean_PE_R2.fasta -p S01_MP1_R1.fasta S01_MP1_R2.fasta -p S01_MP2_R1.fasta S01_MP2_R2.fasta -o S01_53_PE_MP
But the fact is PE read is the paired end, MP1 reads is for 3_5kb mate-pair and MP2 is 8~10kb mate-pair.
How to include the mate pair distance for Ray, if there is a way? Thanks!
YT
yifangt is offline   Reply With Quote
Old 04-09-2013, 05:44 PM   #237
seb567
Senior Member
 
Location: Québec, Canada

Join Date: Jul 2008
Posts: 260
Default

Quote:
Originally Posted by yifangt View Post
Hi Sebastien:
What is the option to include the different mate-pairs information for assembly?
After searching this forum I handled this mixture of reads by pretending there are multiple paired end reads as:
Code:
mpiexec -n 20 Ray -k 53 -p S01_clean_PE_R1.fasta S01_clean_PE_R2.fasta -p S01_MP1_R1.fasta S01_MP1_R2.fasta -p S01_MP2_R1.fasta S01_MP2_R2.fasta -o S01_53_PE_MP
But the fact is PE read is the paired end, MP1 reads is for 3_5kb mate-pair and MP2 is 8~10kb mate-pair.
How to include the mate pair distance for Ray, if there is a way? Thanks!
YT
Ray is usually pretty good at estimating your library sizes.

You can provide the information manually should you wish to do so.

Code:
mpiexec -n 99 Ray -p mate_1.fastq mate_2.fastq 8000 800
In the example above, 8000 is the average outer distance (distance between reads + read lengths) and 800 is the standard deviation on that quantity.
seb567 is offline   Reply With Quote
Old 04-17-2013, 07:58 AM   #238
seb567
Senior Member
 
Location: Québec, Canada

Join Date: Jul 2008
Posts: 260
Default Ray v2.2.0 is now available.

Hello,

Ray v2.2.0 is now available worldwide.

The delay between v2.1.0 and v2.2.0 was quite huge.

Ray v2.2.0 brings a lot of bug fixes and some new features.

The tarball is available at:

http://sourceforge.net/projects/deno...v2.2.0.tar.bz2





The most significant changes include:

* SequencesLoader: the Illumina export format is now supported
* add build option for MPI I/O
* void infinite loops during read recycling
* messages must not be passed by value
* Fixed a linking error caused by ordering
* FusionTaskCreator: don't lose genomic regions during merging
* new file GraphPartition.txt shows the distribution of objects
* readahead operations are used for reading gz files
* core: fixed a race condition occurring with -route-messages
* SeedingData: fix regression for seed checkpointing
* all the code of Ray was ported to this new GraphPath framework

The GraphPath framework reduces the memory usage and avoid some misassembly
errors by enforcing the Bruijn graph property.

* Scaffolder: don't fetch reads from repeated objects

This fixes running time issues on large genomes with repeats.

* SeedingData: implemented a staggered mean algorithm

* Mock: removed the limit on the number of input files
* Library: implemented checkpointing for paired reads
* removed all calls to fflush(stdout) and cout.flush()
* SeedExtender: reduce the verbosity of graph traversal
* reduced the amount of information in the standard output
* JoinerTaskCreator: reduced the default verbosity
* KmerAcademyBuilder: reduced the verbosity for graph construction
* implemented an adaptive Bloom filter
* store a path as a sequence instead of a vector of vertices for efficiency
* SequencesLoader: add support for short file names





All changes in Ray between v2.1.0 and v2.2.0

Charles Joly Beauparlant (1):
Added an example plugin.

Sébastien Boisvert (160):
Some work around the minirank model.
Ported Ray plugins to the mini-ranks RayPlatform.
Ray plugins were ported to the mini-ranks.
Moved the destruction of allocators in RayPlatform.
I ported Ray to some changes in some classes in RayPlatform.
application_core: the application code was simplified
Social networks were added to the release procedure
Code names of old releases were added
Fixed a linking error caused by ordering
Fixed the scope of options in build system
The build system was simplified
AR and LD are not needed here
Ray must abort if the output directory exists
The RayCommand.txt file was fixed for mini-ranks
Added the name of each rank (or mini-rank) in network test
The subgraph must be built regardless if it will be used
Merge branch 'minirank-model' of git://github.com/sebhtml/ray.git
core: CONFIG_* variables are private
core: The option -mini-rank-per-rank was added
ship: removed 6 files in shipped products
core: don't return parameters by value
Mock: new plugin called that does nothing
SequencesLoader: a regression for .bz2 file support was fixed
messages must not be passed by value
Ordered all headers
Updated copyrights
Documentation: there is only one repository for research tools
reverted a wrong hunk from commit 7c361f1530d084c6f99
FusionTaskCreator: don't lose genomic regions during merging
SeedExtender: properly format extension file name
Scaffolder: only put one new line after scaffold sequence
KmerAcademyBuilder: use vertexRank() to find who owns an object
new file GraphPartition.txt shows the distribution of objects
the line that shows the process identifier was moved
CoverageGatherer: kmers.txt should have 1 header only
recursive make was improved
readahead operations are used for reading gz files
SequencesLoader: added the rank number when loading files
core: the partitioner needs the correct rank number
core: fixed a race condition occurring with -route-messages
SeedExtender: display the number of traversed nucleotide symbols
Seeds: new runtime metrics for seeding algorithms
new header for SeedLengthDistribution.txt
new header for any paired read file LibraryN.txt
SequencesLoader: added a few assertions for read partitions
new header for CoverageDistribution.txt
Merge branch 'master' of github.com:sebhtml/ray
Documentation: added the polytope with 4225 vertices
SeedingData: fix regression for seed checkpointing
added documentation for using the torus
Documentation: added arguments for a 5D torus with 1024 vertices
Documentation: fixed permissions
removed the output file called MessagePassingInterface.txt
renamed the AssemblySeed to GraphPath so it can be reused
all the code of Ray was ported to this new GraphPath framework
Documentation: fixed the degree of the polytope
Scaffolder: don't fetch reads from repeated objects
SeedExtender: added documentation in the code for repeated vertices
fixed a couple of compilation warnings
SeedingData: implemented a staggered mean algorithm
Scaffolder: replaced getMode() by the new GraphPath framework
Mock: removed the limit on the number of input files
remove the limitation regarding the maximum number of files
moved message handlers from MessageProcessor to SequencesLoader
Scaffolder: fixed 2 compilation warnings
Library: implemented checkpointing for paired reads
SeedingData: reduced amount of printed information
removed all calls to fflush(stdout) and cout.flush()
SeedExtender: reduce the verbosity of graph traversal
reduced the amount of information in the standard output
JoinerTaskCreator: reduced the default verbosity
KmerAcademyBuilder: reduced the verbosity for graph construction
SequencesLoader: reduced verbosity
VerticesExtractor: reduced verbosity
reduced verbosity
reduced verbosity
SequencesLoader: the Illumina export format is now supported
added a loader interface for file formats
SequencesLoader: all supported formats use the interface
SequencesLoader: implemented a product factory
Mock: updated documentation for new export format
Mock: output a single file for library data
implemented an adaptive Bloom filter
improved the interface of path objects
add debug symbols by default
store a path as a sequence instead of a vector of vertices for efficiency
Mock: the path storage using blocks is not ready
SeedingData: enforce de Bruijn graph property for path storage
SeedingData: use the GraphPath storage code to compute seeds
SeedingData: refactor code so that m_content is abstracted
SeedingData: use 2-bit encoding for paths
SeedingData: plugin options are parsed by plugins
use constants for symbols
SeedingData: correctly detect dead ends
add more information for coding style
MachineHelper: registerPlugin and resolveSymbols must be last
SeedingData: tips can not be seeds
SequencesLoader: add support for short file names
SeedingData: tips are not valid seeds
move some handlers in the Scaffolder plugin
Scaffolder: implement the handler for packed chunks
fix a race condition during directory probing
reduce verbosity of components
add documentation for building on IBM Blue Gene/Q
add code name for upcoming release
SequencesLoader: fix regression (added in ca979832) for line widths
add plugin PathEvaluator to evaluate paths
PathEvaluator: write ContigPaths checkpoints in parallel
reserve storage capacity for sequence file
perform parallel I/O operations
fix a bug when disabling scaffolding
use MPI I/O to write Contigs.fasta
use a file view for each MPI rank
add build option for MPI I/O
avoid parallel I/O without MPI I/O
avoid infinite loops during read recycling
update polytope documentation
add comments for old class
add a new plugin to process spurious seeds
port some plugins to the simplified RayPlatform API
iterate on seeds to filter them
register seed paths in the distributed graph
hide hash values for Bloom filter
push the workflow in a helper class
fetch ancestors of seed heads
seed lengths must be collected after analysis
write seed statistics after analysis
write seed checkpoints after the quality control analysis
write seed files after analysis (-write-seeds)
skip seed quality analysis if checkpoints exist
add steps for better dead end detection
hide mini-ranks in help if they are disabled
correct a bunch of bugs for adapters in Ray
reuse code paths to obtain sequence information
eliminate seeds that have a dead-end on the left
discard seeds with dead-ends on the right
increase the maximum depth for searches
add a class to fetch the attributes of a DNA sequence
create a class to fetch annotations in a portable way
fetch nearby paths to detect bubbles
fix a bug during the registration of seeds
remove any seed that is a weak part of a bubble
add 4 methods that will be implemented later
fix a regression that prevented the closing of a file
add new reference in the output
disable the seed filter when using short kmers
add a maximum coverage depth for dead end search
adapt the allowed depth in function of the data
add design blueprints for the new plugin
SpuriousSeedAnnihilator: disable debug messages by default
TaxonomyViewer: rename the plugin to TaxonomyViewer
remove plugin_ from all plugin directory names
add new line for publications
application_core: fix buggy message routing
SeedExtender: don't traverse path if it's consumed already
SeedingData: fix a bug for the phix system test
update the CMakeList.txt
use git to store version names
Disable the filtering code during the computation of seeds
This is Ray v2.2.0




All changes in RayPlatform between v1.1.0 and v1.1.1


Sébastien Boisvert (56):
initial work on miniranks with VirtualMachine and Minirank
I added some design documentation for mini-ranks.
spinlocks are more suitable for this job
added design documentation for mini-ranks.
First implementation of mini-ranks in RayPlatform
The core must provide the mini-rank number.
Documentation: added description of macros.
Fixed some bugs in the mini-ranks model.
Moved the destruction of allocators in the core.
Mini-rank source and mini-rank destination are required.
The desctructor of the middleware must be called.
A mini-rank must tell the rank that it has messages to send.
The class MessageQueue does the job of receiving messages.
Non-blocking queues will be used for the communication.
The non-blocking message queue for mini-ranks is ready.
MPI_Recv must be called to get the mini-rank numbers.
This is the branch for RayPlatform v7.0.0.
core: The old behavior (no mini-ranks) now works as expected
core: RayPlatform is responsible for creating mini-ranks
The old adapter API documentation was removed
Message reception is now interleaved with send operations.
More buffers are needed for mini-ranks
communication: don't register already registered buffers
The build system is less verbose
New API call to get the number of mini-ranks per rank
Added a method to get the MessagesHandler object
Merge branch 'minirank-model' of github.com:sebhtml/RayPlatform into minirank-model
Merge branch 'minirank-model' of git://github.com/sebhtml/RayPlatform.git
handlers: new option to cache operation codes
communication: messages must be passed with a pointer
Ordered headers in all files
Updated copyrights
The short name was updated in headers
The website was updated in every file
a retry is necessary when a message is pushed into a full ring
Documentation: updated RayPlatform mini-ranks blueprints
communication: moved writeFiles() in a second method
communication: removed a few debugging instructions
Documentation: added gate blueprints
Documentation: improved design for non-linear scheduling
routing: renamed the hypercube to polytope
Documentation: added Torus description
a radix of 2 produces a hypercube
use the Q and ASSERT build arguments in RayPlatform
routing: implemented a new communication graph: the torus
Merge branch 'master' of github.com:sebhtml/RayPlatform
core: use specific code to get memory usage on Blue Gene/Q
the next release will likely be 1.2.0 and not 7.0.0
add option to provide public access to a master mode
add the core in each plugin
add two macros to configure handlers
fixed directives to compile mini-ranks
core: fix buggy message routing
improve the patch for message routing with a configuration
core: fix a regression for registered handle names
This is RayPlatform v1.1.0.
seb567 is offline   Reply With Quote
Old 04-23-2013, 09:11 AM   #239
yifangt
Member
 
Location: Canada

Join Date: Feb 2011
Posts: 61
Default Ray2.2.0

Hi, Sebastian!
Several questions while I am trying Ray2.2.0:
1) When I tried to optimize the installation as suggested in the README.md
Code:
The best way to build Ray is to use whole-program optimization.
With gcc, use this script:
bash ./scripts/Build-Link-Time-Optimization.sh
I could not use higher kmer > 32, even I made change of the line:
Code:
  -D MAXKMERLENGTH=255 \
The reason I want bigger maxkmer is my read length can be 250bp. Did I miss anything?

2) As I got 2 new mate pair libraries for scaffolding, can I make use of the contigs I already have with Ray2.1.0 to combine them together to have "better/longer" scaffold as theoretically expected?
Code:
mpiexec -n 20 Ray -k 35 -p $INPATH/LAN4_35_clean_PE_R1.fasta $INPATH/LAN4_35_clean_PE_R2.fasta  -p $INPATH/LAN4_80_clean_PE_R1.fasta $INPATH/LAN4_80_clean_PE_R2.fasta -s $INPATH/S01/S01_071/Contigs.fasta  -o $OUTPATH/S01_035
Actually when I run above settings, I could not get longer scaffold/contigs at all, and the contigs were broken! I was expecting the contigs should be bigger than the single read file -s $INPATH/S01/S01_071/Contigs.fasta What does this mean? Or, do I need to run both 3 libraries (1 PE, 2 MP libraries) together again from the beginning?

3) To follow my last post about the MP size, I do not have the standard deviation of the insert size, how do I handle that? As you mentioned
HTML Code:
Ray is usually pretty good at estimating your library sizes.
Does that mean I do not need to provide the insert size for Ray? Thanks a lot!

Last edited by yifangt; 04-23-2013 at 09:17 AM.
yifangt is offline   Reply With Quote
Old 04-29-2013, 11:06 AM   #240
seb567
Senior Member
 
Location: Québec, Canada

Join Date: Jul 2008
Posts: 260
Default

Quote:
Originally Posted by yifangt View Post
Hi, Sebastian!
Several questions while I am trying Ray2.2.0:
1) When I tried to optimize the installation as suggested in the README.md
Code:
The best way to build Ray is to use whole-program optimization.
With gcc, use this script:
bash ./scripts/Build-Link-Time-Optimization.sh
I could not use higher kmer > 32, even I made change of the line:
Code:
  -D MAXKMERLENGTH=255 \
The reason I want bigger maxkmer is my read length can be 250bp. Did I miss anything?
I fixed this build script.

https://github.com/sebhtml/ray/commi...e15e5ce6146d45

Quote:

2) As I got 2 new mate pair libraries for scaffolding, can I make use of the contigs I already have with Ray2.1.0 to combine them together to have "better/longer" scaffold as theoretically expected?
Code:
mpiexec -n 20 Ray -k 35 -p $INPATH/LAN4_35_clean_PE_R1.fasta $INPATH/LAN4_35_clean_PE_R2.fasta  -p $INPATH/LAN4_80_clean_PE_R1.fasta $INPATH/LAN4_80_clean_PE_R2.fasta -s $INPATH/S01/S01_071/Contigs.fasta  -o $OUTPATH/S01_035
Actually when I run above settings, I could not get longer scaffold/contigs at all, and the contigs were broken! I was expecting the contigs should be bigger than the single read file -s $INPATH/S01/S01_071/Contigs.fasta What does this mean? Or, do I need to run both 3 libraries (1 PE, 2 MP libraries) together again from the beginning?
Yes. you need to restart from the beginning.

Quote:

3) To follow my last post about the MP size, I do not have the standard deviation of the insert size, how do I handle that? As you mentioned
HTML Code:
Ray is usually pretty good at estimating your library sizes.
Does that mean I do not need to provide the insert size for Ray? Thanks a lot!
I think it does. You should check LibraryStatistics.txt regardless.
seb567 is offline   Reply With Quote
Reply

Tags
assembler, genome, illumina, mix

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:07 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO