Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • pybedtools - a Python wrapper for BEDTools

    [cross-posted to BEDTools mailing list]

    I use both BEDTools and Python a lot. So I've posted a new Python package, pybedtools, which wraps BEDTools programs in an easy-to-use Pythonic interface. The GitHub repo is here:

    Python wrapper -- and more -- for BEDTools (bioinformatics tools for "genome arithmetic") - daler/pybedtools


    and you can view the docs here:



    Note this is different from the recently-released "bedtools-python", which is an API for low-level access to interval files. In contrast, "pybedtools" tries to give you all the functionality of BEDTools from your Python code.

    This is still very much in development so not all parts of BEDTools are fully supported yet. I've tried to focus on documentation and doctests for this release; future releases will be more code-centric.

    For bug reports/feature requests, please use either this thread or the BEDTools mailing list.

    -Ryan

  • #2
    Version 0.5 of pybedtools has now been released. There have been many, many improvements since the last version, thanks to collaboration with Brent Pedersen and Aaron Quinlan.

    In addition to wrapping all the BEDTools programs (including the latest multiBamCov, tagBam, and nucBed programs) and making them accessible from within Python, pybedtools now extends BEDTools by allowing feature-by-feature manipulation of BED/GFF/GTF/BAM/SAM/VCF files.

    There's lots more that pybedtools provides . . . as a brief example, here's the complete code that identifies genes that are <5kb from intergenic SNPs, given a file of genes and a file of SNPs:

    Code:
    from pybedtools import BedTool
    snps = BedTool('snps.bed.gz')
    genes = BedTool('hg19.gff')
    intergenic_snps = (snps - genes)
    nearby = genes.closest(intergenic_snps, d=True, stream=True)
    for gene in nearby:
        if int(gene[-1]) < 5000:
            print gene.name
    In contrast, here's how to do the same analysis in a shell script, which requires knowledge of bash, awk, and Perl:

    Code:
    snps=../test/data/snps.bed.gz
    genes=../test/data/hg19.gff
    intergenic_snps=/tmp/intergenic_snps
    
    snp_fields=`zcat $snps | awk '(NR == 2){print NF; exit;}'`
    gene_fields=9
    distance_field=$(($gene_fields + $snp_fields + 1))
    
    intersectBed -a $snps -b $genes -v > $intergenic_snps
    
    closestBed -a $genes -b $intergenic_snps -d \
    | awk '($'$distance_field' < 5000){print $9;}' \
    | perl -ne 'm/[ID|Name|gene_id]=(.*?);/; print "$1\n"'
    
    rm $intergenic_snps
    (You can read notes on this comparison at http://packages.python.org/pybedtool...omparison.html)


    Full documentation: http://packages.python.org/pybedtools/
    Source (contributions welcome): https://github.com/daler/pybedtools

    Or you can get a brief overview of pybedtools in:

    Pybedtools: a flexible Python library for manipulating genomic datasets and annotations
    Ryan K. Dale, Brent S. Pedersen, and Aaron R. Quinlan
    Bioinformatics (2011) first published online September 23, 2011
    doi:10.1093/bioinformatics/btr539

    Comment


    • #3
      ftp

      Thanks for pybedtools. It has been very helpful. Do you know if I can use remote bam files (FTP) with pybedtools yet? I know this feature was just added to Bedtools, but I can't seem to figure out how to use it.

      Thanks,
      Mary

      Comment


      • #4
        Thanks for pointing this out. I've just added support for remote BAMs in the development version (https://github.com/daler/pybedtools).

        When creating a new BedTool object, you can now use the remote=True keyword argument. Here's an example that grabs the first 10 lines of a remote BAM and converts to BED:

        Code:
        import pybedtools
        url = 'ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/data/HG00096/exome_alignment/HG00096.chrom11.ILLUMINA.bwa.GBR.exome.20120522.bam'
        x = pybedtools.BedTool(url, remote=True)
        y = x.bam_to_bed(stream=True)
        for i, feature in enumerate(y):
            if i == 9:
                break
            print feature,

        this prints:

        Code:
         11	60636	60736	SRR081241.13799221/1	0	+
         11	60674	60774	SRR077487.5548889/1	0	+
         11	60684	60784	SRR077487.12853301/1	0	+
         11	60789	60889	SRR077487.5548889/2	0	-
         11	60950	61050	SRR077487.13826494/1	0	+
         11	60959	61059	SRR081241.13799221/2	0	-
         11	61052	61152	SRR077487.12853301/2	0	-
         11	61548	61648	SRR081241.16743804/2	0	+
         11	61665	61765	SRR081241.16743804/1	0	-

        Keep in mind that doing more extensive work that may need to read large parts of the remote file (like intersections) may take a long time.

        Comment


        • #5
          implementation question

          UPDATE: Please ignore! I had a problem with my bedtools installation, fixed that and now this is no longer an issue! -_-

          Hello, pybedtools looks absolutely perfect for what I need to do, but I have a newbie implementation question.

          I'm trying to get the simple examples working, but I'm having trouble with the subtract function.
          I've just installed pybedtools-0.6.7 on a mac and i'm running:

          Code:
          import pybedtools
          
          # set up 3 different bedtools
          a = pybedtools.BedTool(pybedtools.example_filename('a.bed'))
          b = pybedtools.BedTool(pybedtools.example_filename('a.bed'))
          c = pybedtools.BedTool(pybedtools.example_filename('a.bed'))
          print a.count()
          print a
          print a.subtract(b)
          the result is:
          Code:
          4
          chr1	1	100	feature1	0	+
          chr1	100	200	feature2	0	+
          chr1	150	500	feature3	0	-
          chr1	900	950	feature4	0	+
          
          Traceback (most recent call last):
            File "test.py", line 9, in <module>
              print a.subtract(b)
            File "/Library/Python/2.7/site-packages/pybedtools-0.6.7-py2.7-macosx-10.8-intel.egg/pybedtools/bedtool.py", line 664, in decorated
              result = method(self, *args, **kwargs)
            File "/Library/Python/2.7/site-packages/pybedtools-0.6.7-py2.7-macosx-10.8-intel.egg/pybedtools/bedtool.py", line 146, in not_implemented_func
              raise NotImplementedError(help_str)
          NotImplementedError: "subtractBed" does not appear to be installed or on the path, so this method is disabled.  Please install a more recent version of BEDTools and re-import to use this method.
          I have the same issue when I try version 0.6.6 on a mac or 0.6.7 on linux. Have I missed something basic? Neither (a-b-c).count() nor (a-b).count() work for me either

          Thanks!!
          Last edited by agroff11; 09-30-2014, 11:04 AM. Reason: answered own question

          Comment


          • #6
            pybedtools wraps and extends BEDTools, so you need to make sure BEDTools is available on your path.

            What version of BEDTools are you running? The easiest way to find out is by typing

            Code:
            bedtools --version
            in the terminal.

            Comment


            • #7
              thanks

              Thank you! I had not updated my path post installation

              Comment


              • #8
                OK, good -- glad it works.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Advancing Precision Medicine for Rare Diseases in Children
                  by seqadmin




                  Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                  12-16-2024, 07:57 AM
                • seqadmin
                  Recent Advances in Sequencing Technologies
                  by seqadmin



                  Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                  Long-Read Sequencing
                  Long-read sequencing has seen remarkable advancements,...
                  12-02-2024, 01:49 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 12-17-2024, 10:28 AM
                0 responses
                33 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 12-13-2024, 08:24 AM
                0 responses
                49 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 12-12-2024, 07:41 AM
                0 responses
                34 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 12-11-2024, 07:45 AM
                0 responses
                46 views
                0 likes
                Last Post seqadmin  
                Working...
                X