Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tried to view AMOS.afg file (37.1 GB) using a couple of programs. Tablet is painfully slow, but it eventually quit reporting error in some line. Hawkeye (AMOS package) successfully imported assembly in bank. and even opened graphic window showing contig 1, but then hung forever and has to be killed.
    Code:
    [yaximik@G5NNJN1 ~]$ hawkeye
    START DATE: Mon Mar 11 11:06:54 2013
    Bank is: /home/yaximik/AssRefMap/SC/Ray/RayOutput/AMOS.afg.bnk
        0%                                            100%
    AFG ..................................................
    Messages read: 175403161
    Objects added: 175403161
    Objects deleted: 0
    Objects replaced: 0
    END DATE:   Mon Mar 11 12:13:09 2013
    Opening /home/yaximik/AssRefMap/SC/Ray/RayOutput/AMOS.afg.bnk... [160.12s]
    Indexing Contigs   .......... [83.11s] 107326772 reads in 1409913 contigs
    Scaffold information not available
    Mates not available:WHAT: Could not open bank file, /home/yaximik/AssRefMap/SC/Ray/RayOutput/AMOS.afg.bnk/FRG.ifo, No such file or directory
    LINE: 1264
    FILE: Bank_AMOS.cc
    
    Features not available
    Initialize Display .Loading AssemblyStats...[8.95s]
    .Loading Features...      [0.01s]
    .Loading Libraries...     [0.00s]
    .Loading Scaffolds....Loading Contigs...       [186.21s]
    ....Loading NCharts...       [21.83s]
    . [217.01s]
    Loading Contig 1... [0.05s] 109076 reads
    Loading reads...         [343.52s]
    Total Load Time: [803.92s]
    Loading mates ..................................................
    inserts: 108933 mated: 0 matelisted: 0 unmated: 108933 happy: 0 unhappy: 0
    Paint: coverage contigs insetcovfeat readcovfeat features inserts
    width: 12457 swidth: 778 height: 26357..
    Killed
    [yaximik@G5NNJN1 ~]$
    What viewer can be used to view assembly?

    Comment


    • Quote:
      If not, is it an average fragment length in the library?

      Yes.

      Quote:
      Such as surmised from BioAnalyzer trace, for example?

      Yes, but the BioAnalyzer will also include sequencing adapters in the evaluation whereas these are not included in sequencing reads usually.
      How the average length is calculated? I guess after reads are aligned to assembly, correct? But I thought that assembly depends on paired end infomation, so unless I am wrong one has a logical short circuit here - paired reads are distanced based on assembly, which depends on distance between paired reads.
      It is not that I am maliciously after how algorithm was designed. I am trying to guess where such discrepancy between Bioanalyzer and assembler is coming from. Could it be that Bioanalyzer traces for libraries are so misleading, so I have really no idea about size of libraries I am sequencing? Or autocalc is misled somehow in library size estimation?

      Comment


      • Originally posted by yaximik View Post
        Tried to view AMOS.afg file (37.1 GB) using a couple of programs. Tablet is painfully slow, but it eventually quit reporting error in some line. Hawkeye (AMOS package) successfully imported assembly in bank. and even opened graphic window showing contig 1, but then hung forever and has to be killed.
        Code:
        [yaximik@G5NNJN1 ~]$ hawkeye
        START DATE: Mon Mar 11 11:06:54 2013
        Bank is: /home/yaximik/AssRefMap/SC/Ray/RayOutput/AMOS.afg.bnk
            0%                                            100%
        AFG ..................................................
        Messages read: 175403161
        Objects added: 175403161
        Objects deleted: 0
        Objects replaced: 0
        END DATE:   Mon Mar 11 12:13:09 2013
        Opening /home/yaximik/AssRefMap/SC/Ray/RayOutput/AMOS.afg.bnk... [160.12s]
        Indexing Contigs   .......... [83.11s] 107326772 reads in 1409913 contigs
        Scaffold information not available
        Mates not available:WHAT: Could not open bank file, /home/yaximik/AssRefMap/SC/Ray/RayOutput/AMOS.afg.bnk/FRG.ifo, No such file or directory
        LINE: 1264
        FILE: Bank_AMOS.cc
        
        Features not available
        Initialize Display .Loading AssemblyStats...[8.95s]
        .Loading Features...      [0.01s]
        .Loading Libraries...     [0.00s]
        .Loading Scaffolds....Loading Contigs...       [186.21s]
        ....Loading NCharts...       [21.83s]
        . [217.01s]
        Loading Contig 1... [0.05s] 109076 reads
        Loading reads...         [343.52s]
        Total Load Time: [803.92s]
        Loading mates ..................................................
        inserts: 108933 mated: 0 matelisted: 0 unmated: 108933 happy: 0 unhappy: 0
        Paint: coverage contigs insetcovfeat readcovfeat features inserts
        width: 12457 swidth: 778 height: 26357..
        Killed
        [yaximik@G5NNJN1 ~]$
        What viewer can be used to view assembly?
        When the AMOS file format was implemented, I tested Hawkeye, Tablet, and Bank-transact.

        You can submit a ticket and I will eventually look at that, but this feature has not really changed since it was implemented.

        What is the hardware (memory, processor, video card) on which you are running Hawkeye ?

        For visualization, I am working on Ray Cloud Browser.

        Comment


        • Originally posted by yaximik View Post
          How the average length is calculated?
          I guess after reads are aligned to assembly, correct?
          Yes, but all of this happens in the de Bruijn graph -- there is no aligner in the process.


          But I thought that assembly depends on paired end infomation, so unless I am wrong one has a logical short circuit here - paired reads are distanced based on assembly, which depends on distance between paired reads.
          Yes, it's like a bootstrapping process: distances are sampled from seeds (similar to unitigs), and then the empirical distribution is used to extend longer contigs by matching paired reads to the distribution.

          I like your short circuit.

          *

          It is not that I am maliciously after how algorithm was designed.
          On the contrary, science advances when curious people step in.

          I am trying to guess where such discrepancy between Bioanalyzer and assembler is coming from. Could it be that Bioanalyzer traces for libraries are so misleading, so I have really no idea about size of libraries I am sequencing?
          One hypothesis is that the population of molecules analyzed by the Bioanalyzer is a superset of the molecules that are present on the sequencing flow cell after library preparation.

          Or autocalc is misled somehow in library size estimation?
          That may be the case too, but I would be surprised by that.

          Comment


          • When the AMOS file format was implemented, I tested Hawkeye, Tablet, and Bank-transact.
            You can submit a ticket and I will eventually look at that, but this feature has not really changed since it was implemented.
            What is the hardware (memory, processor, video card) on which you are running Hawkeye ?
            Its two quad core Xeon E5620 with 96GB memory and nVIDIA NV 300 in the double display mode.

            Comment


            • Originally posted by yaximik View Post
              Its two quad core Xeon E5620 with 96GB memory and nVIDIA NV 300 in the double display mode.
              Therefore, hardware should not be a problem with such a nice computer.

              Is your user experience with Hawkeye or Tablet problematic only with AMOS files generated by Ray or the issue is also occurring with AMOS files generated by other tools ?

              Comment


              • maximum kmer length?

                Hello Sebastien:

                Can I ask what is the maximum value that I can set MAXKMERLENGTH for Ray 2.1? So far I can see the largest value ever set is 128, and for Velvet I ever saw is 151. Somewhere I saw "There is an arbitary number that can be set for MAXKMERLENGTH", but could not find the link anymore. Can I confirm the max value for Ray I can set? Thanks a lot!

                YT

                Comment


                • Originally posted by yifangt View Post
                  Hello Sebastien:

                  Can I ask what is the maximum value that I can set MAXKMERLENGTH for Ray 2.1? So far I can see the largest value ever set is 128, and for Velvet I ever saw is 151. Somewhere I saw "There is an arbitary number that can be set for MAXKMERLENGTH", but could not find the link anymore. Can I confirm the max value for Ray I can set? Thanks a lot!

                  YT
                  A message in Ray has a maximum size of 4000 bytes and 2 bits are necessary per nucleotide. The maximum is therefore 4000 / 2 = 2000.

                  However, read lengths and sequencing errors will be limiting factors here.

                  Comment


                  • assembly output format for visualisation

                    Another question!
                    I want to view my assembly with other software e.g. Tablets, Mauve or OSlay, etc. Is there any way in Ray to convert the output files to ACE, MAQ, SAM or BAM format for those post-assembly programs?
                    In the FAQ section of your site there is question about the AMOS format for the output, but I did not do that. Do I have to run the assembly again and have the -amos option on? But unfortunately the AMOS format is not universal for other programs to read.
                    I was trying to figure out what the output files are about, but not sure which one I should use for those visualization programs, or which one should be used for perl/shell script for the format converstion.
                    Appreciate if you could give me any clue.

                    Thanks!

                    YT

                    Comment


                    • Originally posted by yifangt View Post
                      Another question!
                      I want to view my assembly with other software e.g. Tablets, Mauve or OSlay, etc. Is there any way in Ray to convert the output files to ACE, MAQ, SAM or BAM format for those post-assembly programs?
                      The two current options are:

                      1. use -amos, then use a amos-compatible viewer

                      2. use -write-kmers, then use Ray Cloud Browser

                      Originally posted by yifangt View Post

                      In the FAQ section of your site there is question about the AMOS format for the output, but I did not do that. Do I have to run the assembly again and have the -amos option on?

                      Yes, you need to run it again.

                      Originally posted by yifangt View Post

                      But unfortunately the AMOS format is not universal for other programs to read.

                      There are two formats for de novo assemblies: amos and fastg. The amos format is supported by far more applications.

                      Originally posted by yifangt View Post

                      I was trying to figure out what the output files are about, but not sure which one I should use for those visualization programs, or which one should be used for perl/shell script for the format converstion.
                      There are Contigs.fasta and Scaffolds.fasta.

                      One thing you can do is to map your fastq sequences on the contigs and use, for example, "samtools tview" to visualize that.


                      Another way is to run Ray with -write-kmers and to use Ray Cloud Browser, which is probably the most-interactive web genome viewer you'll find out there.

                      Originally posted by yifangt View Post

                      Appreciate if you could give me any clue.

                      Thanks!

                      YT

                      Comment


                      • How to include the mate-pair information in Ray

                        Hi Sebastien:
                        What is the option to include the different mate-pairs information for assembly?
                        After searching this forum I handled this mixture of reads by pretending there are multiple paired end reads as:
                        Code:
                        mpiexec -n 20 Ray -k 53 -p S01_clean_PE_R1.fasta S01_clean_PE_R2.fasta -p S01_MP1_R1.fasta S01_MP1_R2.fasta -p S01_MP2_R1.fasta S01_MP2_R2.fasta -o S01_53_PE_MP
                        But the fact is PE read is the paired end, MP1 reads is for 3_5kb mate-pair and MP2 is 8~10kb mate-pair.
                        How to include the mate pair distance for Ray, if there is a way? Thanks!
                        YT

                        Comment


                        • Originally posted by yifangt View Post
                          Hi Sebastien:
                          What is the option to include the different mate-pairs information for assembly?
                          After searching this forum I handled this mixture of reads by pretending there are multiple paired end reads as:
                          Code:
                          mpiexec -n 20 Ray -k 53 -p S01_clean_PE_R1.fasta S01_clean_PE_R2.fasta -p S01_MP1_R1.fasta S01_MP1_R2.fasta -p S01_MP2_R1.fasta S01_MP2_R2.fasta -o S01_53_PE_MP
                          But the fact is PE read is the paired end, MP1 reads is for 3_5kb mate-pair and MP2 is 8~10kb mate-pair.
                          How to include the mate pair distance for Ray, if there is a way? Thanks!
                          YT
                          Ray is usually pretty good at estimating your library sizes.

                          You can provide the information manually should you wish to do so.

                          Code:
                          mpiexec -n 99 Ray -p mate_1.fastq mate_2.fastq 8000 800
                          In the example above, 8000 is the average outer distance (distance between reads + read lengths) and 800 is the standard deviation on that quantity.

                          Comment


                          • Ray v2.2.0 is now available.

                            Hello,

                            Ray v2.2.0 is now available worldwide.

                            The delay between v2.1.0 and v2.2.0 was quite huge.

                            Ray v2.2.0 brings a lot of bug fixes and some new features.

                            The tarball is available at:







                            The most significant changes include:

                            * SequencesLoader: the Illumina export format is now supported
                            * add build option for MPI I/O
                            * void infinite loops during read recycling
                            * messages must not be passed by value
                            * Fixed a linking error caused by ordering
                            * FusionTaskCreator: don't lose genomic regions during merging
                            * new file GraphPartition.txt shows the distribution of objects
                            * readahead operations are used for reading gz files
                            * core: fixed a race condition occurring with -route-messages
                            * SeedingData: fix regression for seed checkpointing
                            * all the code of Ray was ported to this new GraphPath framework

                            The GraphPath framework reduces the memory usage and avoid some misassembly
                            errors by enforcing the Bruijn graph property.

                            * Scaffolder: don't fetch reads from repeated objects

                            This fixes running time issues on large genomes with repeats.

                            * SeedingData: implemented a staggered mean algorithm

                            * Mock: removed the limit on the number of input files
                            * Library: implemented checkpointing for paired reads
                            * removed all calls to fflush(stdout) and cout.flush()
                            * SeedExtender: reduce the verbosity of graph traversal
                            * reduced the amount of information in the standard output
                            * JoinerTaskCreator: reduced the default verbosity
                            * KmerAcademyBuilder: reduced the verbosity for graph construction
                            * implemented an adaptive Bloom filter
                            * store a path as a sequence instead of a vector of vertices for efficiency
                            * SequencesLoader: add support for short file names





                            All changes in Ray between v2.1.0 and v2.2.0

                            Charles Joly Beauparlant (1):
                            Added an example plugin.

                            Sébastien Boisvert (160):
                            Some work around the minirank model.
                            Ported Ray plugins to the mini-ranks RayPlatform.
                            Ray plugins were ported to the mini-ranks.
                            Moved the destruction of allocators in RayPlatform.
                            I ported Ray to some changes in some classes in RayPlatform.
                            application_core: the application code was simplified
                            Social networks were added to the release procedure
                            Code names of old releases were added
                            Fixed a linking error caused by ordering
                            Fixed the scope of options in build system
                            The build system was simplified
                            AR and LD are not needed here
                            Ray must abort if the output directory exists
                            The RayCommand.txt file was fixed for mini-ranks
                            Added the name of each rank (or mini-rank) in network test
                            The subgraph must be built regardless if it will be used
                            Merge branch 'minirank-model' of git://github.com/sebhtml/ray.git
                            core: CONFIG_* variables are private
                            core: The option -mini-rank-per-rank was added
                            ship: removed 6 files in shipped products
                            core: don't return parameters by value
                            Mock: new plugin called that does nothing
                            SequencesLoader: a regression for .bz2 file support was fixed
                            messages must not be passed by value
                            Ordered all headers
                            Updated copyrights
                            Documentation: there is only one repository for research tools
                            reverted a wrong hunk from commit 7c361f1530d084c6f99
                            FusionTaskCreator: don't lose genomic regions during merging
                            SeedExtender: properly format extension file name
                            Scaffolder: only put one new line after scaffold sequence
                            KmerAcademyBuilder: use vertexRank() to find who owns an object
                            new file GraphPartition.txt shows the distribution of objects
                            the line that shows the process identifier was moved
                            CoverageGatherer: kmers.txt should have 1 header only
                            recursive make was improved
                            readahead operations are used for reading gz files
                            SequencesLoader: added the rank number when loading files
                            core: the partitioner needs the correct rank number
                            core: fixed a race condition occurring with -route-messages
                            SeedExtender: display the number of traversed nucleotide symbols
                            Seeds: new runtime metrics for seeding algorithms
                            new header for SeedLengthDistribution.txt
                            new header for any paired read file LibraryN.txt
                            SequencesLoader: added a few assertions for read partitions
                            new header for CoverageDistribution.txt
                            Merge branch 'master' of github.com:sebhtml/ray
                            Documentation: added the polytope with 4225 vertices
                            SeedingData: fix regression for seed checkpointing
                            added documentation for using the torus
                            Documentation: added arguments for a 5D torus with 1024 vertices
                            Documentation: fixed permissions
                            removed the output file called MessagePassingInterface.txt
                            renamed the AssemblySeed to GraphPath so it can be reused
                            all the code of Ray was ported to this new GraphPath framework
                            Documentation: fixed the degree of the polytope
                            Scaffolder: don't fetch reads from repeated objects
                            SeedExtender: added documentation in the code for repeated vertices
                            fixed a couple of compilation warnings
                            SeedingData: implemented a staggered mean algorithm
                            Scaffolder: replaced getMode() by the new GraphPath framework
                            Mock: removed the limit on the number of input files
                            remove the limitation regarding the maximum number of files
                            moved message handlers from MessageProcessor to SequencesLoader
                            Scaffolder: fixed 2 compilation warnings
                            Library: implemented checkpointing for paired reads
                            SeedingData: reduced amount of printed information
                            removed all calls to fflush(stdout) and cout.flush()
                            SeedExtender: reduce the verbosity of graph traversal
                            reduced the amount of information in the standard output
                            JoinerTaskCreator: reduced the default verbosity
                            KmerAcademyBuilder: reduced the verbosity for graph construction
                            SequencesLoader: reduced verbosity
                            VerticesExtractor: reduced verbosity
                            reduced verbosity
                            reduced verbosity
                            SequencesLoader: the Illumina export format is now supported
                            added a loader interface for file formats
                            SequencesLoader: all supported formats use the interface
                            SequencesLoader: implemented a product factory
                            Mock: updated documentation for new export format
                            Mock: output a single file for library data
                            implemented an adaptive Bloom filter
                            improved the interface of path objects
                            add debug symbols by default
                            store a path as a sequence instead of a vector of vertices for efficiency
                            Mock: the path storage using blocks is not ready
                            SeedingData: enforce de Bruijn graph property for path storage
                            SeedingData: use the GraphPath storage code to compute seeds
                            SeedingData: refactor code so that m_content is abstracted
                            SeedingData: use 2-bit encoding for paths
                            SeedingData: plugin options are parsed by plugins
                            use constants for symbols
                            SeedingData: correctly detect dead ends
                            add more information for coding style
                            MachineHelper: registerPlugin and resolveSymbols must be last
                            SeedingData: tips can not be seeds
                            SequencesLoader: add support for short file names
                            SeedingData: tips are not valid seeds
                            move some handlers in the Scaffolder plugin
                            Scaffolder: implement the handler for packed chunks
                            fix a race condition during directory probing
                            reduce verbosity of components
                            add documentation for building on IBM Blue Gene/Q
                            add code name for upcoming release
                            SequencesLoader: fix regression (added in ca979832) for line widths
                            add plugin PathEvaluator to evaluate paths
                            PathEvaluator: write ContigPaths checkpoints in parallel
                            reserve storage capacity for sequence file
                            perform parallel I/O operations
                            fix a bug when disabling scaffolding
                            use MPI I/O to write Contigs.fasta
                            use a file view for each MPI rank
                            add build option for MPI I/O
                            avoid parallel I/O without MPI I/O
                            avoid infinite loops during read recycling
                            update polytope documentation
                            add comments for old class
                            add a new plugin to process spurious seeds
                            port some plugins to the simplified RayPlatform API
                            iterate on seeds to filter them
                            register seed paths in the distributed graph
                            hide hash values for Bloom filter
                            push the workflow in a helper class
                            fetch ancestors of seed heads
                            seed lengths must be collected after analysis
                            write seed statistics after analysis
                            write seed checkpoints after the quality control analysis
                            write seed files after analysis (-write-seeds)
                            skip seed quality analysis if checkpoints exist
                            add steps for better dead end detection
                            hide mini-ranks in help if they are disabled
                            correct a bunch of bugs for adapters in Ray
                            reuse code paths to obtain sequence information
                            eliminate seeds that have a dead-end on the left
                            discard seeds with dead-ends on the right
                            increase the maximum depth for searches
                            add a class to fetch the attributes of a DNA sequence
                            create a class to fetch annotations in a portable way
                            fetch nearby paths to detect bubbles
                            fix a bug during the registration of seeds
                            remove any seed that is a weak part of a bubble
                            add 4 methods that will be implemented later
                            fix a regression that prevented the closing of a file
                            add new reference in the output
                            disable the seed filter when using short kmers
                            add a maximum coverage depth for dead end search
                            adapt the allowed depth in function of the data
                            add design blueprints for the new plugin
                            SpuriousSeedAnnihilator: disable debug messages by default
                            TaxonomyViewer: rename the plugin to TaxonomyViewer
                            remove plugin_ from all plugin directory names
                            add new line for publications
                            application_core: fix buggy message routing
                            SeedExtender: don't traverse path if it's consumed already
                            SeedingData: fix a bug for the phix system test
                            update the CMakeList.txt
                            use git to store version names
                            Disable the filtering code during the computation of seeds
                            This is Ray v2.2.0




                            All changes in RayPlatform between v1.1.0 and v1.1.1


                            Sébastien Boisvert (56):
                            initial work on miniranks with VirtualMachine and Minirank
                            I added some design documentation for mini-ranks.
                            spinlocks are more suitable for this job
                            added design documentation for mini-ranks.
                            First implementation of mini-ranks in RayPlatform
                            The core must provide the mini-rank number.
                            Documentation: added description of macros.
                            Fixed some bugs in the mini-ranks model.
                            Moved the destruction of allocators in the core.
                            Mini-rank source and mini-rank destination are required.
                            The desctructor of the middleware must be called.
                            A mini-rank must tell the rank that it has messages to send.
                            The class MessageQueue does the job of receiving messages.
                            Non-blocking queues will be used for the communication.
                            The non-blocking message queue for mini-ranks is ready.
                            MPI_Recv must be called to get the mini-rank numbers.
                            This is the branch for RayPlatform v7.0.0.
                            core: The old behavior (no mini-ranks) now works as expected
                            core: RayPlatform is responsible for creating mini-ranks
                            The old adapter API documentation was removed
                            Message reception is now interleaved with send operations.
                            More buffers are needed for mini-ranks
                            communication: don't register already registered buffers
                            The build system is less verbose
                            New API call to get the number of mini-ranks per rank
                            Added a method to get the MessagesHandler object
                            Merge branch 'minirank-model' of github.com:sebhtml/RayPlatform into minirank-model
                            Merge branch 'minirank-model' of git://github.com/sebhtml/RayPlatform.git
                            handlers: new option to cache operation codes
                            communication: messages must be passed with a pointer
                            Ordered headers in all files
                            Updated copyrights
                            The short name was updated in headers
                            The website was updated in every file
                            a retry is necessary when a message is pushed into a full ring
                            Documentation: updated RayPlatform mini-ranks blueprints
                            communication: moved writeFiles() in a second method
                            communication: removed a few debugging instructions
                            Documentation: added gate blueprints
                            Documentation: improved design for non-linear scheduling
                            routing: renamed the hypercube to polytope
                            Documentation: added Torus description
                            a radix of 2 produces a hypercube
                            use the Q and ASSERT build arguments in RayPlatform
                            routing: implemented a new communication graph: the torus
                            Merge branch 'master' of github.com:sebhtml/RayPlatform
                            core: use specific code to get memory usage on Blue Gene/Q
                            the next release will likely be 1.2.0 and not 7.0.0
                            add option to provide public access to a master mode
                            add the core in each plugin
                            add two macros to configure handlers
                            fixed directives to compile mini-ranks
                            core: fix buggy message routing
                            improve the patch for message routing with a configuration
                            core: fix a regression for registered handle names
                            This is RayPlatform v1.1.0.

                            Comment


                            • Ray2.2.0

                              Hi, Sebastian!
                              Several questions while I am trying Ray2.2.0:
                              1) When I tried to optimize the installation as suggested in the README.md
                              Code:
                              The best way to build Ray is to use whole-program optimization.
                              With gcc, use this script:
                              bash ./scripts/Build-Link-Time-Optimization.sh
                              I could not use higher kmer > 32, even I made change of the line:
                              Code:
                                -D MAXKMERLENGTH=255 \
                              The reason I want bigger maxkmer is my read length can be 250bp. Did I miss anything?

                              2) As I got 2 new mate pair libraries for scaffolding, can I make use of the contigs I already have with Ray2.1.0 to combine them together to have "better/longer" scaffold as theoretically expected?
                              Code:
                              mpiexec -n 20 Ray -k 35 -p $INPATH/LAN4_35_clean_PE_R1.fasta $INPATH/LAN4_35_clean_PE_R2.fasta  -p $INPATH/LAN4_80_clean_PE_R1.fasta $INPATH/LAN4_80_clean_PE_R2.fasta -s $INPATH/S01/S01_071/Contigs.fasta  -o $OUTPATH/S01_035
                              Actually when I run above settings, I could not get longer scaffold/contigs at all, and the contigs were broken! I was expecting the contigs should be bigger than the single read file -s $INPATH/S01/S01_071/Contigs.fasta What does this mean? Or, do I need to run both 3 libraries (1 PE, 2 MP libraries) together again from the beginning?

                              3) To follow my last post about the MP size, I do not have the standard deviation of the insert size, how do I handle that? As you mentioned
                              HTML Code:
                              Ray is usually pretty good at estimating your library sizes.
                              Does that mean I do not need to provide the insert size for Ray? Thanks a lot!
                              Last edited by yifangt; 04-23-2013, 08:17 AM.

                              Comment


                              • Originally posted by yifangt View Post
                                Hi, Sebastian!
                                Several questions while I am trying Ray2.2.0:
                                1) When I tried to optimize the installation as suggested in the README.md
                                Code:
                                The best way to build Ray is to use whole-program optimization.
                                With gcc, use this script:
                                bash ./scripts/Build-Link-Time-Optimization.sh
                                I could not use higher kmer > 32, even I made change of the line:
                                Code:
                                  -D MAXKMERLENGTH=255 \
                                The reason I want bigger maxkmer is my read length can be 250bp. Did I miss anything?
                                I fixed this build script.




                                2) As I got 2 new mate pair libraries for scaffolding, can I make use of the contigs I already have with Ray2.1.0 to combine them together to have "better/longer" scaffold as theoretically expected?
                                Code:
                                mpiexec -n 20 Ray -k 35 -p $INPATH/LAN4_35_clean_PE_R1.fasta $INPATH/LAN4_35_clean_PE_R2.fasta  -p $INPATH/LAN4_80_clean_PE_R1.fasta $INPATH/LAN4_80_clean_PE_R2.fasta -s $INPATH/S01/S01_071/Contigs.fasta  -o $OUTPATH/S01_035
                                Actually when I run above settings, I could not get longer scaffold/contigs at all, and the contigs were broken! I was expecting the contigs should be bigger than the single read file -s $INPATH/S01/S01_071/Contigs.fasta What does this mean? Or, do I need to run both 3 libraries (1 PE, 2 MP libraries) together again from the beginning?
                                Yes. you need to restart from the beginning.


                                3) To follow my last post about the MP size, I do not have the standard deviation of the insert size, how do I handle that? As you mentioned
                                HTML Code:
                                Ray is usually pretty good at estimating your library sizes.
                                Does that mean I do not need to provide the insert size for Ray? Thanks a lot!
                                I think it does. You should check LibraryStatistics.txt regardless.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM
                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                30 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                32 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                28 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                53 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X