Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • RTG 3.6: Somatic improvements / CGI v2 read support / variant evaluation

    Real Time Genomics are pleased to announce the availability of new releases of our full analysis suite, RTG Core (commercial / free for non-commercial use), and our utility package, RTG Tools (BSD licensed). This release includes new features and performance improvements. Some of the highlights of this release:

    * Further improvements to somatic variant calling which reduce the number of false positive calls while retaining somatic calling sensitivity. These improvements are achieved by incorporating the presence of somatic-allele-supporting evidence in the normal into the Bayesian computation. Additional VCF annotations quantifying these "contrary observations" are included in the output.

    * De novo variant detection in families and pedigrees now incorporates similar techniques for a reduction in false positives.

    * Support for aligning and variant calling with reads produced by Complete Genomics Inc has been extended to their newer 29 base-pair read structure (these reads consisting of 10-9-10 sub-reads are often represented as 30 base-pairs with a redundant N).

    * Many improvements to variant comparison with vcfeval, including the improved handling of call sets containing overlapping variants, identification of variants which do not constitute a diploid match but which share a common allele (e.g. zygosity errors), and the ability to select alternative output modes depending on the desired analysis workflow.

    * Many other minor improvements (full release notes for this version are detailed below.)

    Special thanks to the members of the GA4GH benchmarking data working group (in particular Justin Zook, Rebecca Truty, Peter Kruche, and Kevin Jacobs) for valuable feedback and suggestions for improvements to vcfeval that are available in this release.

    If you haven't used RTG Core before (or maybe even if you have), we suggest you run the demo-family.sh script that runs through a short end-to-end demonstration of sex-aware and pedigree-aware family variant calling, including de novo variant detection and variant evaluation with vcfeval. (It also makes a nice demo of our comprehensive simulation tools.)

    Commercial users of RTG Core may download the update from our website at http://realtimegenomics.com/products/rtg-core-downloads. Non-commercial users can download the update from our website at http://realtimegenomics.com/products...non-commercial or build from the source on github at https://github.com/RealTimeGenomics/rtg-core.

    Users of RTG Tools, which is made freely available for non-commercial or commercial use alike, can download the new version from our website at http://realtimegenomics.com/products/rtg-tools or build from the source code on github at https://github.com/RealTimeGenomics/rtg-tools.

    Detailed changes are listed below by area. Please read these through fully, as some command-line flags have changed, so updates to your pipeline scripts may be required. For more information on new features, see the RTG Operations Manual.


    RTG Core 3.6 (2015-12-07)
    -------------------------

    ## Basic Formatting and Mapping

    * cg2sdf: Add support for formatting CGI TSV reads files containing
    their version 2 reads. These reads are typically represented as 30
    base-pair arms (10-10-10 subread structure containing a redundant N
    which is removed during formatting), although 29 base-pair arm
    representation (10-9-10 subread structure) is also supported.

    * sdf2cg: This new command allows exporting SDF formatted Complete
    Genomics read data to their TSV reads file format.

    * cgmap: Now supports aligning the version 2 read structure. When
    aligning CGI reads, an appropriate indexing mask must be selected
    which is appropriate for the type of reads being mapped, so --mask
    is now a required flag.

    * cgmap: Mask names have been changed to more clearly indicate which
    version of CGI reads they are applicable to. Available masks are
    now "cg1" (formerly named "cgmaska15b1"), "cg1-fast" (formerly named
    "cgmaska1b1", and "cg2" (a new mask for use with version 2 reads
    which roughly equivalent in sensitivity to "cg1-fast"). Additional
    masks may be available in future.

    ### Variant Calling

    * somatic: Features an improvement to the Bayesian calculation to
    better account for the presence of contrary evidence. This has
    resulted in a large reduction in false positives while maintaining
    sensitivity.

    * population/family: These pedigree-based callers now contain similar
    adjustments to the Bayesian calculation to better account for
    contrary evidence of de novo variants. This has resulted in a large
    reduction in false positive de novos while maintaining sensitivity.

    * somatic/family/population: These callers produce additional
    annotations in their output VCF that indicate the degree of contrary
    observations for the novel allele. The COC annotation contains a
    simple count of the number of contrary observations and the COF
    annotation contains the contrary observations as a fraction of total
    observations. Users who wish to adjust the sensitivity/precision
    tradeoff of their de novo call sets may wish to use these attributes
    for filtering.

    * family/population: The marking of equivalent complex calls was not
    functioning for sex-aware calling on the Y chromosome when both
    males and females are present, resulting in occasional additional
    equivalent but differently represented variants present in the
    output.

    * population: Better error handling when a the user supplies a
    pedigree that contains cycles.

    * avrbuild: The new COC and COF annotations are now available as
    derived annotations that can be used in model building. One
    interesting use of these attributes may be to build AVR models
    specifically for predicting the correctness of de novo predictions.

    * snp/family/population/somatic: These variant callers all now include
    support for CGI 29 base-pair read structure.

    * snp/family/population/somatic: The pre-built AVR models distributed
    with RTG have all been rebuilt using current annotations and updated
    training data.

    ### Variant Processing and Analysis

    * vcfannotate: New option --relabel allows sample names in a VCF to be
    changed.

    * vcfsubset: New flag --remove-qual to reset the QUAL field to '.'

    * vcfsubset: Fixed a bug where encountering a VCF record that did not
    contain any FORMAT field specified in --keep-format would cause all
    subsequent records to be dropped.

    * vcffilter: For convenience the existing flags --keep-format,
    --remove-format, --keep-samples, etc. now support comma separated
    lists, For example: --keep-format GQ,AVR.

    * vcffilter: New flag --remove-hom to exclude records where a sample
    was called as homozygous.

    * vcfeval: New additional output modes that allow the selection of
    output files that best suit the desired workflow. These are
    controlled via --output-mode flag and there are currently three
    options available: split (the default, equivalent to previous
    behaviour), annotate (outputs baseline and calls files augmented
    with match status annotations), and combine (provides a simple
    side-by-side two-column VCF). For more information, see the user
    manual.

    * vcfeval: Removed option --baseline-tp, as the output of the baseline
    version of true positive variants is now always performed. When using
    the default (split) output mode, these are output to tp-baseline.vcf
    as before.

    * vcfeval: Added the ability to detect those FP and FN which have
    common alleles (e.g.: zygosity errors). Previously this could be
    done manually by running vcfeval a second time using --squash-ploidy
    on the fp.vcf and fn.vcf of an initial comparison, but now it is
    automatically performed when running the new annotate or combine
    output modes.

    * vcfeval: New flag --ref-overlap to allow matching variants where the
    alleles would overlap as long as the overlap bases are the same as
    ref. Unambiguous VCFs should not need this option, but such cases
    can arise when using unsophisticated callers or VCF merging tools.

    * vcfeval: Weighted ROC files now include a final data row that
    includes the statistics corresponding to no threshold application
    (and this includes any variants that were processed during path
    finding but which do not contain any ROC score field). In an ROC
    plot, this final point may be visible as a "tick" at the end of the
    curve.

    * vcfeval: The set of ROC data files that are produced are now for the
    following three subsets of calls: all calls, snps only, and non-snps
    only (e.g. indels, MNPs). Some users were doing separate runs of
    vcfeval on input sets filtered by category in order to get separate
    statistics for snps vs indels, an approach which is prone to
    misclassification of complex variants.

    * vcfeval: When processing multi-sample VCF files, it is now possible
    to specify different sample names for baseline vs callset, via the
    form: --sample baseline_sample,calls_sample.

    * vcfeval: Fixed a rare bug where if the input VCFs contained multiple
    variants with the same reference position and length, the output
    VCFs could contain the incorrect variant.

    * vcfeval: Fixed a crash that could occur when the input set contained
    a variant that extended off the end of the reference sequence.

    * rocplot: (GUI) Fix several minor issues: initial paint was not laid
    out correctly; very small ROC files would not display status info;
    some UI layout improvements; and add a small amount display padding.

    * rocplot: (GUI) Malformed ROC data files now show an error dialog.

    ### Metagenomics

    * similarity: This tool will now make use of available taxonomy
    information in the case of a single supplied SDF, in order to allow
    the easy computation of a neighbour joining tree from a reference
    species database (or subset thereof).

    ### Other

    * sdf2fasta/sdf2fastq: New flag --interleave to permit output of
    paired end data to a single output in interleaved fashion
    (i.e. alternating left and right arms). This allows piping paired
    end data for simple command-line processing (although there is also
    sdf2sam which may be more applicable depending on the processing
    desired).

    * cgsim: Added support for simulating reads with the CGI version 2
    read structure, controlled via a new flag, --cg-read-version.

    * readsim: Add support for both versions of CGI read structures. Use
    --machine complete_genomics (the original 35 base pair read
    structure) or --machine complete_genomics_2 (the newer 29 base pair
    structure).

    * aview: New flag --unflatten to display unflattened CGI reads when
    present. At present only version 1 reads can be displayed in
    unflattened form.

    * misc: bash completion for RTG commands and options now works on Mac
    OS X (see scripts/rtg-bash-completion for instructions).

    * misc: The underlying htsjdk library used for SAM/BAM support has
    been updated to version 1.141.

    * many: The JRE bundled with Linux/Windows builds is now 1.8.
    Len Trigg, Ph.D.
    Real Time Genomics
    www.realtimegenomics.com

  • #2
    New stable releases are now available which include minor improvements and bug fixes.

    The first of these is our full analysis suite, RTG Core 3.6.1. The changes in this version are listed below. Commercial users may download the update from our website at http://realtimegenomics.com/products/rtg-core-downloads. Non-commercial users can download the update from our website at http://realtimegenomics.com/products...non-commercial or build from the updated source code on github at https://github.com/RealTimeGenomics/rtg-core.

    We have also produced updated builds of our utilities package, RTG Tools 3.6.1, which is made freely available for non-commercial or commercial use alike. More information and download links are available from our website at http://realtimegenomics.com/products/rtg-tools or build from the updated source on github at https://github.com/RealTimeGenomics/rtg-tools.


    RTG Core 3.6.1 (2016-01-25)
    ---------------------------

    This release primarily includes bugfixes and minor improvements:

    * coverage: Fixed an exception that could occur when supplying a
    reference SDF that did not contain all the sequences present in the
    alignments.

    * family: Fixed an exception that could occur when supplying a family
    pedigree involving members not present in the input mappings.

    * population: The COF/COC annotations for de novo calls that were
    recently added to the family caller are now also produced by the
    population command when appropriate.

    * map/cgmap: When mapping pre-formatted reads containing SAM read
    group information embedded in the SDF and the input format was
    explicitly specified as SDF via -F sdf, the read group info wasn't
    being picked up. This is now fixed.

    * vcfmerge: Speed improvement when merging VCF files containing a
    large number of contig header declarations.

    * many: Speed improvement when accessing indexed datafiles
    (e.g. BED/BAM/VCF) that were being filtered by very large sets of
    regions.

    * rocplot: Better error handling when trying to run the GUI on a
    machine where a graphics environment is unavailable.

    * rocplot: (GUI) Update frame title when graph title changes.
    Len Trigg, Ph.D.
    Real Time Genomics
    www.realtimegenomics.com

    Comment


    • #3
      New stable releases are now available which include minor improvements and bug fixes.

      The first of these is our full analysis suite, RTG Core 3.6.1. The changes in this version are listed below. Commercial users may download the update from our website at http://realtimegenomics.com/products/rtg-core-downloads. Non-commercial users can download the update from our website at http://realtimegenomics.com/products...non-commercial or build from the updated source code on github at https://github.com/RealTimeGenomics/rtg-core.

      We have also produced updated builds of our utilities package, RTG Tools 3.6.1, which is made freely available for non-commercial or commercial use alike. More information and download links are available from our website at http://realtimegenomics.com/products/rtg-tools or build from the updated source on github at https://github.com/RealTimeGenomics/rtg-tools.


      RTG Core 3.6.2 (2016-03-10)
      ---------------------------

      This release primarily includes bugfixes and minor improvements:

      * map: mapping very large numbers of reads in a single chunk or with
      low step size settings could exceed some internal datastructures,
      giving unpredictable results. An explicit check for these conditions
      has been added.

      * map: Reduction in peak memory use when mapping paired-end data.

      * vcfeval: Better error handling for variants which have triploid or
      higher GT (ploidy higher than 2 is not supported).

      * extract: Extracting multiple regions from SAM/BAM across different
      chromosomes could cause an exception.

      * rocplot: Improved error handling for yet more ways in which
      attempting to open a GUI from a headless server can fail.

      * rocplot: (GUI) Minor improvement to crosshair handling.
      Len Trigg, Ph.D.
      Real Time Genomics
      www.realtimegenomics.com

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        04-22-2024, 07:01 AM
      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 08:47 AM
      0 responses
      14 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      60 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      60 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      54 views
      0 likes
      Last Post seqadmin  
      Working...
      X