Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • RTG 3.8: New QC tools / improved machine learning / free simulation tools

    Real Time Genomics are pleased to announce the availability of new releases of our full analysis suite, RTG Core, and our utility package, RTG Tools. This release includes new features and performance improvements. Some of the highlights of this release:

    * Improvements aimed at preprocessing and QC. In particular, RTG includes two new commands, fastqtrim and petrim, for preprocessing FASTQ files to apply various kinds of trimming before entering the NGS pipeline. These commands greatly expand what was previously available during data formatting.

    * The suite of simulation commands that were previously only available as part of RTG Core have been included in the RTG Tools package. These commands encompass simulation of reference genomes (genomesim), simulation of population-level variants (popsim), individual sample genomes using population variants (samplesim), simulation of samples as member of a pedigree obeying inheritance rules (childsim), simulation of de-novo variants (denovosom), generation of a genome given a VCF of sample variants (samplereplay), and read simulation according to a range of sequencer parameters (readsim/cgsim).

    * Initial support for accepting CRAM files as input to variant calling commands and most other commands that accept alignments as input. For some commands this may now require specifying a reference SDF in order to decode the CRAM files.

    * Improvements to the prebuilt AVR models that perform variant scoring. These models have been rebuilt using training data incorporating the latest truth sets produced by the GIAB initiative as well as improvements to the underlying machine learning algorithms.

    * User manual improvements, in particular the baseline progressions section has been rearranged to better illustrate how to run end-to-end RTG calling pipelines that make best use of RTG features such as sex-aware and pedigree-aware variant calling.

    If you haven't used RTG Core before (or maybe even if you have), we suggest you run the demo-family.sh script that runs through a short end-to-end demonstration of sex-aware and pedigree-aware family variant calling, including de novo variant detection and variant evaluation with vcfeval. (It also makes a nice demo of our comprehensive simulation tools.)

    Commercial users of RTG Core may download the update from our website at http://realtimegenomics.com/products/rtg-core-downloads. Non-commercial users can download the update from our website at http://realtimegenomics.com/products...non-commercial or build from the source on github at https://github.com/RealTimeGenomics/rtg-core.

    Users of RTG Tools, which is made freely available for non-commercial or commercial use alike, can download the new version from our website at http://realtimegenomics.com/products/rtg-tools or build from the source code on github at https://github.com/RealTimeGenomics/rtg-tools.


    Detailed changes are listed below by area. For more information on new features, see the RTG Operations Manual which is included within the distribution as HTML and PDF.

    ## Basic Formatting and Mapping

    * fastqtrim: This new command allows trimming of FASTQ files with much
    more flexibility and control than is available directly from
    format. See the user manual for more information and examples.

    * petrim: This new command allows trimming of read bases in paired-end
    data where read-through has occurred, as determined by alignment
    overlap. See the user manual for more information and examples.

    * format: Support for reading interleaved paired-end FASTQ added. This
    is useful for formatting directly from streamed output of the petrim
    command, avoiding additional disk I/O.

    * format/map: The quality encoding for FASTQ input files now defaults to
    the sanger encoding used by the majority of modern FASTQ files, and so
    the --quality-format flag typically only needs to be specified when
    processing older FASTQ files employing an alternative encoding.

    * many: When outputting FASTA/FASTQ, ensure consistent use of unix line
    endings across the various commands.

    * calibrate: When calibrating multiple BAM files, each is calibrated in
    an independent thread, obeying --threads flag.

    * sammerge: New flag --subsample that permits a fraction of the
    alignments through to the output. In addition, the new flag --seed
    lets you control which seed is used for this filtering.

    * coverage: Computes additional QC metrics fold-80 penalty and median
    coverage.

    * coverage: New flag --per-region to which changes how BED/BEDGRAPH
    coverage records are triggered, from being whenever the coverage level
    changes, to only when the region changes.

    * sammerge: Will now create output files in CRAM format if the output
    filename ends with ".cram". This requires the user to specify the
    reference SDF via the new --template flag.

    * index: Now allows creating indexes for CRAM files. These are the
    `.bai` indexes currently supported by htsjdk, rather than `.crai`
    indexes.

    ### Variant Calling

    * snp: Includes INFO.DP annotations in output VCF, for consistency with
    the existing multi-sample caller output.

    * family/population/somatic: New VCF annotations (OCOC/OCOF/DCOC/DCOF)
    that indicate the count/fraction of contrary evidence observed in the
    original(parent) vs derived(child) samples.

    * snp/family/population/somatic: These commands now support SAM/BAM
    files that make use of the '=' character in the SEQ field (such as can
    be created by BamUtil:convert)

    * snp/family/population/somatic: These commands now support CRAM files
    as input.

    * family/population: Improved error reporting for semantically incorrect
    user-supplied pedigree information.

    * snp/family/population/somatic: Improvements to the accuracy of the
    pre-built AVR models. These models have been rebuilt using training
    data incorporating the latest truth sets produced by the GIAB
    initiative as well as improvements to the underlying machine learning
    algorithm.

    * snp/family/population: The default AVR model is now illumina-wgs.avr
    (previously the default was illumina-exome.avr). For exome calling,
    the illumina-exome.avr model provides an advantage over
    illumina-wgs.avr only when the primary interest is maximising the
    scoring of variants called outside of exome target regions.

    * many: For compatibility with non-human species, sex handling of PAR
    regions has been extended to allow the length of a PAR region in each
    member of an allosome pair to be of different length.

    * svprep: Add the ability to run on merged alignment files rather than
    requiring alignment files to be separated into mated vs unmated vs
    unmapped.

    * svprep: New flag --no-augment flag permits the computation of read
    group statistics files only, for use when collecting statistics from
    third party alignment files.

    * avrpredict: New flag --sample to allow AVR scoring of only the
    specified sample names.

    * avrpredict: New flag --vcf-score-field to allow storing the AVR score
    into a format field with a different name, useful when comparing
    multiple scoring models.

    * avrbuild: Improvements to the quality of models built in the presence
    of missing annotations.

    ### Variant Processing and Analysis

    * vcfmerge: When combining records at the same position, vcfmerge will
    now not combine records at a site where some records use a VCF padding
    base (as required by the VCF specification to prevent REF or ALT being
    zero-length) and some records do not. This is because a record which
    utilizes a padding base is not making an assertion about the genotype
    of the padding base itself, and merging these records loses this
    semantic distinction. (The old behaviour can be obtained via
    --Xnon-padding-aware.)

    * vcfannotate: New flag --no-header to suppress output of the VCF header.

    * vcfsubset: New flag --remove-ids to allow clearing the ID column.

    * rocplot: New flag --zoom which allows the specification of an initial
    zoom to display. See the user manual for a description of the
    coordinate syntax.

    * rocplot: (GUI) Add ability to remove a curve via per-curve pop-up menu
    in the side-pane.

    * rocplot: (GUI) Prevent loading the same ROC data file multiple times,
    and improve error handling on invalid files.

    * rocplot: (GUI) Improvements to the open file dialog. Now defaults to
    displaying ROC data files only, permits opening multiple ROC data
    files at once via multi-select, and other minor changes.

    * rocplot: (GUI) The "Cmd" button now shows the command in a pop-up
    dialog rather than sending it to the terminal, which eliminates the
    need to search through multiple tmux windows to find where rocplot was
    started from.

    * many: Invalid VCF header contig length specifications are now reported
    gracefully.

    * many: Improved error reporting of general VCF header parsing errors,
    now include the problematic line where possible.

    * many: Improved error reporting of malformed GT fields.

    ### Metagenomics

    * species: Fix the handling of mappings that contain non-unique
    read-names (as could arise when mapping directly from FASTQ files as
    separate mapping runs and passing the resulting alignments to
    species).

    * species: Accuracy improvements when using paired-end data as the
    underlying data source.

    ### Other

    * pedstats: Improved the GraphViz pedigree visualization layout for
    normal pedigree structures. The old layout is available with the new
    ``--simple-dot`` flag.

    * many: The following simulation commands are now included as part of
    RTG Tools: genomesim, cgsim, readsim, popsim, samplesim, childsim,
    denovosim, samplereplay.

    * readsim: When using --taxonomy-distribution and --distribution, one of
    --abundance or --dna-fraction must be supplied in order to indicate
    the desired interpretation.

    * index: the -f flag is now optional and by default index will attempt to
    determine the file format by the extension.

    * many: Most commands accept the advanced flag --Xforce that allows them
    to continue in the case of pre-existing output files or
    directories. Be aware that particularly in the case of output
    directories the final directory contents may include files from
    previous runs (or even other commands), so this option should not be
    used in production scenarios.

    * many: Fixed an exception that could occur when performing multiple
    region based querying of SAM/BED/VCF records, where the regions were
    densely packed near the ends of chromosomes.

    * many: Almost all commands that take SAM/BAM as input now support CRAM
    files as input. Some of these commands have a new flag used to supply
    the reference SDF which is required when decoding CRAM.

    * misc: The rtg bash command completion has been improved to be more
    portable and no longer caches completion data on disk.

    * many: Linux and Windows packages have updated the bundled JRE to the
    latest from Oracle.
    Len Trigg, Ph.D.
    Real Time Genomics
    www.realtimegenomics.com

  • #2
    New stable releases are now available which include minor improvements and bug fixes.

    The first of these is our full analysis suite, RTG Core 3.8.1. The changes in this version are listed below. Commercial users may download the update from our website at http://realtimegenomics.com/products/rtg-core-downloads. Non-commercial users can download the update from our website at http://realtimegenomics.com/products...non-commercial or build from the updated source code on github at https://github.com/RealTimeGenomics/rtg-core.

    We have also produced updated builds of our utilities package, RTG Tools 3.8.1, which is made freely available for non-commercial or commercial use alike. More information and download links are available from our website at http://realtimegenomics.com/products/rtg-tools or build from the updated source on github at https://github.com/RealTimeGenomics/rtg-tools.


    RTG Core 3.8.1 (2017-05-29)
    ---------------------------

    This release primarily includes bugfixes and minor improvements:

    * rocplot: (GUI) The right hand panel now includes a visual indication
    of the color for each curve.

    * rocplot: (GUI) The color for a curve can now be set via color picker
    available from the per-curve context menu.

    * rocplot: (GUI) Reordering the curves is now achieved by drag and drop
    rather than the (now removed) reorder buttons.

    * misc: The RTG Tools release includes a scripts/demo-tools.sh that
    gives a quick end-to-end demonstration of simulation and VCF
    manipulation commands. This is similar in nature to the
    scripts/demo-family.sh script that is included in RTG Core.

    * vcfeval: Fix an exception caused by the skipping of heterozygous
    structural variants being dependent on the GT field allele
    ordering. These variants are now correctly skipped. In previous
    releases the cases that slipped through would enter matching with a
    stub allele representing the SV allele.

    * vcfeval: When running a sample-free comparison via the option
    `--sample ALT`, ignore records/alleles corresponding to structural
    variants. In 3.8 these could produce an exception, and in previous
    releases any SV alleles present were included as a generic token
    during matching.

    * vcfeval: Improve the handling of non-user exceptions encountered
    during VCF loading. Previously these would produce an often
    inscrutable message.

    * version: Update copyright year and include an alternative citation
    more appropriate for those using RTG Tools.

    * popsim: Now includes the random number seed in the VCF header for
    consistency with with other simulation commands.
    Len Trigg, Ph.D.
    Real Time Genomics
    www.realtimegenomics.com

    Comment


    • #3
      New stable releases are now available which include minor improvements and bug fixes.

      The first of these is our full analysis suite, RTG Core 3.8.2. The changes in this version are listed below. Commercial users may download the update from our website at http://realtimegenomics.com/products/rtg-core-downloads. Non-commercial users can download the update from our website at http://realtimegenomics.com/products...non-commercial or build from the updated source code on github at https://github.com/RealTimeGenomics/rtg-core.

      We have also produced updated builds of our utilities package, RTG Tools 3.8.2, which is made freely available for non-commercial or commercial use alike. More information and download links are available from our website at http://realtimegenomics.com/products/rtg-tools or build from the updated source on github at https://github.com/RealTimeGenomics/rtg-tools.

      RTG Core 3.8.2 (2017-06-20)
      ---------------------------

      This release primarily includes bugfixes and minor improvements:

      * vcfeval: Records where the REF/ALT contain bases not permitted by the
      VCF specification are now skipped (and reported in the log) rather
      than terminating execution.

      * vcfeval: (`combine` and `ga4gh` output modes only) These modes were
      inserting a redundant VCF header entry containing the command line,
      which has been removed.

      * vcfeval: GA4GH output mode now supports loose positional matching of
      variants (within +/-30bp by default, and adjustable via
      --Xloose-match-distance).

      * many: Prevent number formatting issues in non-English locales. The
      locale is now forced to US.

      * many: Some commands were not appending gzip termination blocks to VCF
      outputs, which could result in subsequent warning messages being
      produced by some third party tools.

      * many: Improve the consistency of exception handling in cases where the
      exception is thrown in a worker thread.

      * many: Attempting to supply file lists via shell process redirection
      would fail in non-obvious ways. File lists from process redirection
      are not currently supported and are now checked for up-front.

      * minor: When setting up rtg bash tab completion, issue a warning if an
      incompatible completion function has already been installed. (This can
      happen on some linux distros if you have the system `bash-completion`
      package installed and attempt to tab-complete rtg before installing
      rtg bash completion.)

      * minor: Fix a typo in the example configuration settings in rtg.cfg
      (specifically, RTG_JAVA_OPTS was incorrectly listed as
      RTG_JAVA_OPTIONS).
      Len Trigg, Ph.D.
      Real Time Genomics
      www.realtimegenomics.com

      Comment


      • #4
        New stable releases are now available which include minor improvements and bug fixes.

        The first of these is our full analysis suite, RTG Core 3.8.3 The changes in this version are listed below. Commercial users may download the update from our website at http://realtimegenomics.com/products/rtg-core-downloads. Non-commercial users can download the update from our website at http://realtimegenomics.com/products...non-commercial or build from the updated source code on github at https://github.com/RealTimeGenomics/rtg-core.

        We have also produced updated builds of our utilities package, RTG Tools 3.8.3 which is made freely available for non-commercial or commercial use alike. More information and download links are available from our website at http://realtimegenomics.com/products/rtg-tools or build from the updated source on github at https://github.com/RealTimeGenomics/rtg-tools.

        RTG Core 3.8.3 (2017-08-02)
        ---------------------------

        This release primarily includes bugfixes and minor improvements:

        * rocplot: (GUI) Improvements to graph zooming, to allow stepping back
        to previous zoom levels as well as fully un-zooming.

        * rocplot: Improve the automatic curve naming heuristic to ignore
        directory name suffixes like "-eval", ".vcfeval" etc, and similar
        prefixes.

        * rocplot: Enable text antialiasing in GUI and PNG output.

        * vcfeval: More graceful handling of input VCFs containing REF values
        that are not valid according to VCF specifications.

        * vcfmerge/vcfeval: Normalize the casing of nucleotides in REF/ALT,
        which permits merging records where the REF/ALT differ in casing.

        * vcffilter: Graceful error handling of a new category of invalid
        javascript expression.

        * vcfsubset: Don't complain when using --keep-filter/--remove-filter
        flags with "PASS" and the VCF header doesn't contain a declaration for
        that filter.

        * misc: Prevent a unit test failure when running on newer versions of
        Ubuntu.
        Len Trigg, Ph.D.
        Real Time Genomics
        www.realtimegenomics.com

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        8 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        8 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        49 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        66 views
        0 likes
        Last Post seqadmin  
        Working...
        X