Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BEDOPS v2.1 released

    BEDOPS is a suite of tools to address common questions raised in genomic studies — mostly with regard to overlap and proximity relationships between data sets. It aims to be fast and flexible, facilitating the efficient and accurate analysis and management of large-scale genomic data.

    The second major release of BEDOPS includes several new features which focus on improving how we handle arbitrarily large datasets, namely through compression and parallelization.

    We are excited to announce the following new features in the just-released v2.1 follow-up:

    * bedops

    -- New --partition operator

    This operator will efficiently split overlapping inputs and report disjoint segments that partition the shared genomic space.

    To demonstrate, say you have a few input BED files (sorted with BEDOPS sort-bed) or equivalent Starch archives. Together they have coordinate segments on chrN that look like:

    HTML Code:
            ------------------------------------
               ---------------
                   ------------------
                   -------------------------------------
                                         ----

    The output from --partition on these inputs would be:

    HTML Code:
            ---
               ----
                   -----------
                              -------
                                     ----
                                         ----
                                             ---
                                                -------
    One example of where this is useful is in finding intersections of elements within a single BED file, which was not possible with BEDOPS tools until now. Consider the following usage, where input.bed is a sorted BED file that we want to "self-intersect":

    $ bedops --partition input.bed \
    | bedmap --count --echo - input.bed \
    | awk -F"|" '($1 > 1) { print $2; }'


    A "real-world" application of this feature is in comparing paired-end reads, where the goal is to facilitate a quick search for abnormal insertions (or, conversely, deletions) between two sequencing experiments.

    * starch

    -- Improved error checking for interleaved records

    * Conversion scripts

    -- All scripts now use BEDOPS sort-bed behind the scenes to output sorted BED output, ready for consumption by BEDOPS utilities like bedextract, bedmap, bedops and closest-features.

    In other words, it is no longer necessary to pipe converted output to sort-bed before piping to other BEDOPS utilities.

    -- New psl2bed conversion script, converting PSL-formatted UCSC BLAT output to BED.

    -- New wig2bed conversion script written in Python.

    -- New *2starch convenience scripts offered for all *2bed scripts, which convert data and output Starch v2 archives.

    * Improved Mac OS X support

    -- New installer package makes installation of BEDOPS binaries and scripts much easier for OS X 10.6 - 10.8 hosts.

    -- Installer resolves fatal library errors seen by some end users of older OS X BEDOPS releases.

    This release also includes major BEDOPS v2 features, such as:

    * Support for BEDOPS Starch archives with main toolkit

    -- The bedextract, bedmap, bedops and closest-features tools now all accept Starch-formatted files as inputs, as well as UCSC BED files, as before. (In other words, it is no longer necessary to extract Starch data to intermediate files before applying set or statistical operations.)

    * Very efficient single-chromosome operations

    -- New --chrom operator applies set, statistical or ID operations to specified chromosome with bedmap, bedops and closest-features, without needing to stream through the entire BED file. This is highly useful for parallelization tasks on very large BED data.

    * bedmap

    -- New --echo-map-id-uniq operator lists unique ID values from mapped elements.

    -- New --max-element and --min-element operators return the highest or lowest scoring overlapping map element.

    * sort-bed

    -- New --max-mem option limits sorting to specified memory, useful for sorting large BED inputs larger than system memory.

    * starch, unstarch and starchcat

    -- BEDOPS Starch v2 archives contain useful, precomputed metadata that can improve the efficiency of scripts.

    For instance, calling unstarch --elements on a Starch v2 archive shows the total number of records in the entire file or for any individual chromosome, while unstarch --bases and unstarch --bases-uniq give the number of total and unique bases covered by elements in the whole archive or over elements of the specified chromosome. These latter two options are analogous to those already available in bedmap.

    As an example, using the --elements operator on a Starch v2 archive made from DNaseI-seq or RNAseq tag data would return the total number of reads over the entire BED file. Using --elements chr3 would return the total number of tags in chromosome chr3.

    Values are precomputed and stored in the archive's metadata, allowing practically instantaneous retrieval. Going back to --elements again, this option is much, much faster than extracting data and piping it to wc -l.

    -- New checksum data help validate the integrity of the archive and its metadata.

    -- Other metadata enhancements to Starch-format archival and extraction, including: --note, --list-chromosomes, --archive-timestamp, --archive-type and --archive-version.

    -- Added 20-35% performance boost to creating Starch archives with starch utility.

    -- New documentation with technical overview of the Starch format specification.

    * Conversion scripts

    -- New gtf2bed conversion script, converting GTF (v2.2) to BED.

    * Overall improvements in 64-bit type handling and error checking

    -- Consistency across the codebase helps ensure that all BEDOPS applications can scale to arbitrarily large genomes.
    Last edited by AlexReynolds; 05-01-2013, 12:55 AM.

Latest Articles

Collapse

  • seqadmin
    Advancing Precision Medicine for Rare Diseases in Children
    by seqadmin




    Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
    12-16-2024, 07:57 AM
  • seqadmin
    Recent Advances in Sequencing Technologies
    by seqadmin



    Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

    Long-Read Sequencing
    Long-read sequencing has seen remarkable advancements,...
    12-02-2024, 01:49 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 12-17-2024, 10:28 AM
0 responses
33 views
0 likes
Last Post seqadmin  
Started by seqadmin, 12-13-2024, 08:24 AM
0 responses
49 views
0 likes
Last Post seqadmin  
Started by seqadmin, 12-12-2024, 07:41 AM
0 responses
34 views
0 likes
Last Post seqadmin  
Started by seqadmin, 12-11-2024, 07:45 AM
0 responses
46 views
0 likes
Last Post seqadmin  
Working...
X