SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Perl or Python ETHANol Bioinformatics 21 08-03-2020 06:34 AM
pysam: a python wrapper of samtools TGBelgard Bioinformatics 36 01-28-2016 03:13 AM
bowtie wrapper script lh3 Bioinformatics 5 06-28-2012 12:27 PM
python dror Bioinformatics 0 11-29-2010 04:22 AM
python in ABySS dror RNA Sequencing 0 11-28-2010 06:19 AM

Reply
 
Thread Tools
Old 02-01-2011, 06:58 AM   #1
daler
Junior Member
 
Location: DC Metro area

Join Date: Feb 2011
Posts: 8
Default pybedtools - a Python wrapper for BEDTools

[cross-posted to BEDTools mailing list]

I use both BEDTools and Python a lot. So I've posted a new Python package, pybedtools, which wraps BEDTools programs in an easy-to-use Pythonic interface. The GitHub repo is here:

https://github.com/daler/pybedtools

and you can view the docs here:

http://daler.github.com/pybedtools/

Note this is different from the recently-released "bedtools-python", which is an API for low-level access to interval files. In contrast, "pybedtools" tries to give you all the functionality of BEDTools from your Python code.

This is still very much in development so not all parts of BEDTools are fully supported yet. I've tried to focus on documentation and doctests for this release; future releases will be more code-centric.

For bug reports/feature requests, please use either this thread or the BEDTools mailing list.

-Ryan
daler is offline   Reply With Quote
Old 09-30-2011, 12:16 PM   #2
daler
Junior Member
 
Location: DC Metro area

Join Date: Feb 2011
Posts: 8
Default

Version 0.5 of pybedtools has now been released. There have been many, many improvements since the last version, thanks to collaboration with Brent Pedersen and Aaron Quinlan.

In addition to wrapping all the BEDTools programs (including the latest multiBamCov, tagBam, and nucBed programs) and making them accessible from within Python, pybedtools now extends BEDTools by allowing feature-by-feature manipulation of BED/GFF/GTF/BAM/SAM/VCF files.

There's lots more that pybedtools provides . . . as a brief example, here's the complete code that identifies genes that are <5kb from intergenic SNPs, given a file of genes and a file of SNPs:

Code:
from pybedtools import BedTool
snps = BedTool('snps.bed.gz')
genes = BedTool('hg19.gff')
intergenic_snps = (snps - genes)
nearby = genes.closest(intergenic_snps, d=True, stream=True)
for gene in nearby:
    if int(gene[-1]) < 5000:
        print gene.name
In contrast, here's how to do the same analysis in a shell script, which requires knowledge of bash, awk, and Perl:

Code:
snps=../test/data/snps.bed.gz
genes=../test/data/hg19.gff
intergenic_snps=/tmp/intergenic_snps

snp_fields=`zcat $snps | awk '(NR == 2){print NF; exit;}'`
gene_fields=9
distance_field=$(($gene_fields + $snp_fields + 1))

intersectBed -a $snps -b $genes -v > $intergenic_snps

closestBed -a $genes -b $intergenic_snps -d \
| awk '($'$distance_field' < 5000){print $9;}' \
| perl -ne 'm/[ID|Name|gene_id]=(.*?);/; print "$1\n"'

rm $intergenic_snps
(You can read notes on this comparison at http://packages.python.org/pybedtool...omparison.html)


Full documentation: http://packages.python.org/pybedtools/
Source (contributions welcome): https://github.com/daler/pybedtools

Or you can get a brief overview of pybedtools in:

Pybedtools: a flexible Python library for manipulating genomic datasets and annotations
Ryan K. Dale, Brent S. Pedersen, and Aaron R. Quinlan
Bioinformatics (2011) first published online September 23, 2011
doi:10.1093/bioinformatics/btr539
daler is offline   Reply With Quote
Old 11-15-2013, 10:17 AM   #3
allenma
Junior Member
 
Location: Boulder, Co

Join Date: Mar 2011
Posts: 5
Default ftp

Thanks for pybedtools. It has been very helpful. Do you know if I can use remote bam files (FTP) with pybedtools yet? I know this feature was just added to Bedtools, but I can't seem to figure out how to use it.

Thanks,
Mary
allenma is offline   Reply With Quote
Old 11-15-2013, 11:13 AM   #4
daler
Junior Member
 
Location: DC Metro area

Join Date: Feb 2011
Posts: 8
Default

Thanks for pointing this out. I've just added support for remote BAMs in the development version (https://github.com/daler/pybedtools).

When creating a new BedTool object, you can now use the remote=True keyword argument. Here's an example that grabs the first 10 lines of a remote BAM and converts to BED:

Code:
import pybedtools
url = 'ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/data/HG00096/exome_alignment/HG00096.chrom11.ILLUMINA.bwa.GBR.exome.20120522.bam'
x = pybedtools.BedTool(url, remote=True)
y = x.bam_to_bed(stream=True)
for i, feature in enumerate(y):
    if i == 9:
        break
    print feature,

this prints:

Code:
 11	60636	60736	SRR081241.13799221/1	0	+
 11	60674	60774	SRR077487.5548889/1	0	+
 11	60684	60784	SRR077487.12853301/1	0	+
 11	60789	60889	SRR077487.5548889/2	0	-
 11	60950	61050	SRR077487.13826494/1	0	+
 11	60959	61059	SRR081241.13799221/2	0	-
 11	61052	61152	SRR077487.12853301/2	0	-
 11	61548	61648	SRR081241.16743804/2	0	+
 11	61665	61765	SRR081241.16743804/1	0	-

Keep in mind that doing more extensive work that may need to read large parts of the remote file (like intersections) may take a long time.
daler is offline   Reply With Quote
Old 09-30-2014, 11:48 AM   #5
agroff11
Junior Member
 
Location: Boston, MA

Join Date: Sep 2014
Posts: 2
Default implementation question

UPDATE: Please ignore! I had a problem with my bedtools installation, fixed that and now this is no longer an issue! -_-

Hello, pybedtools looks absolutely perfect for what I need to do, but I have a newbie implementation question.

I'm trying to get the simple examples working, but I'm having trouble with the subtract function.
I've just installed pybedtools-0.6.7 on a mac and i'm running:

Code:
import pybedtools

# set up 3 different bedtools
a = pybedtools.BedTool(pybedtools.example_filename('a.bed'))
b = pybedtools.BedTool(pybedtools.example_filename('a.bed'))
c = pybedtools.BedTool(pybedtools.example_filename('a.bed'))
print a.count()
print a
print a.subtract(b)
the result is:
Code:
4
chr1	1	100	feature1	0	+
chr1	100	200	feature2	0	+
chr1	150	500	feature3	0	-
chr1	900	950	feature4	0	+

Traceback (most recent call last):
  File "test.py", line 9, in <module>
    print a.subtract(b)
  File "/Library/Python/2.7/site-packages/pybedtools-0.6.7-py2.7-macosx-10.8-intel.egg/pybedtools/bedtool.py", line 664, in decorated
    result = method(self, *args, **kwargs)
  File "/Library/Python/2.7/site-packages/pybedtools-0.6.7-py2.7-macosx-10.8-intel.egg/pybedtools/bedtool.py", line 146, in not_implemented_func
    raise NotImplementedError(help_str)
NotImplementedError: "subtractBed" does not appear to be installed or on the path, so this method is disabled.  Please install a more recent version of BEDTools and re-import to use this method.
I have the same issue when I try version 0.6.6 on a mac or 0.6.7 on linux. Have I missed something basic? Neither (a-b-c).count() nor (a-b).count() work for me either

Thanks!!

Last edited by agroff11; 09-30-2014 at 12:04 PM. Reason: answered own question
agroff11 is offline   Reply With Quote
Old 09-30-2014, 12:05 PM   #6
daler
Junior Member
 
Location: DC Metro area

Join Date: Feb 2011
Posts: 8
Default

pybedtools wraps and extends BEDTools, so you need to make sure BEDTools is available on your path.

What version of BEDTools are you running? The easiest way to find out is by typing

Code:
bedtools --version
in the terminal.
daler is offline   Reply With Quote
Old 09-30-2014, 12:08 PM   #7
agroff11
Junior Member
 
Location: Boston, MA

Join Date: Sep 2014
Posts: 2
Default thanks

Thank you! I had not updated my path post installation
agroff11 is offline   Reply With Quote
Old 09-30-2014, 12:10 PM   #8
daler
Junior Member
 
Location: DC Metro area

Join Date: Feb 2011
Posts: 8
Default

OK, good -- glad it works.
daler is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:09 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO