SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
BEDTools v2.9 - new tools/features quinlana Bioinformatics 4 06-22-2011 05:42 PM
Support for parallelization of paired-end alignments with BWA Fabien Campagne Bioinformatics 0 12-17-2010 05:40 AM
Does Cufflinks support single-end and paired end data together ? ersenkavak Bioinformatics 1 10-22-2010 08:26 AM
BEDTools: A flexible suite of utilities for comparing genomic features nilshomer Literature Watch 5 02-01-2010 10:36 AM
why is paired-end alignment support so important found Bioinformatics 1 03-03-2009 08:05 AM

Reply
 
Thread Tools
Old 11-18-2009, 06:34 PM   #1
quinlana
Senior Member
 
Location: Charlottesville

Join Date: Sep 2008
Posts: 119
Default BEDTools: new tools / support for paired-end features.

Hello all,
I just posted version 2.3.0 of BEDTools (http://code.google.com/p/bedtools/) which includes several new and useful updates.

(1) I added four new tools:

(a) shuffleBed. Randomly permutes the locations of a BED file among a genome. Useful for testing for significant enrichment of say, an experimental observation with a genome feature. It also allows one to define a separate BED file of genomic regions that should be _exluded_ from random placement (e.g. genome gaps).
(b) slopBed. Adds a requested number of base pairs to each end of a BED feature. More clever than an awk on a BED file, as it is constrained by the size of each chromosome.
(c) maskFastaFromBed. Masks a FASTA file based on BED coordinates. Useful making custom genome files for, as an example, targeted capture experiments, etc.
(d) pairToPair. Returns overlaps between two paired-end BED files. This is great for finding structural variants that are private or shared among samples. Specifically, pairToPair will find paired-end alignments / or variants that have the same orientation on both ends and have overlapping alignments on both ends. I've found this to be very useful for classifying structural variation detected by paired-end mapping.

(2) I increased the speed of intersectBed by nearly 50%.
(3) I improved / corrected some of the help messages.
(4) I improved sanity checking for BED entries.

(5) I added two new scripts. The first, samToBed, will convert alignments in SAM format to BED format. It also accepts input from standard input so as to play nicely with the "samtools view" command. The second, gffToBed, converts GFF annotations to BED.

I hope you find these useful.
Aaron
quinlana is offline   Reply With Quote
Old 11-19-2009, 12:22 AM   #2
dawe
Senior Member
 
Location: 4530'25.22"N / 915'53.00"E

Join Date: Apr 2009
Posts: 258
Default

Hi Aaron,
I exploit this post to ask you how closestBed works... I really don't get what a tie is.
As example

Code:
$ closestBed   -a mysplit/merged_IRR1.bed -b mm9.refseq.tss.bed6 | head
chr1	4172972	4173006	1	+	chr1	4334223	4350473	NM_011283	0	-
chr1	4557081	4557115	1	+	chr1	4334223	4350473	NM_011283	0	-
chr1	4557081	4557115	1	+	chr1	4481008	4486494	NM_011441	0	-
chr1	4562824	4562858	1	+	chr1	4334223	4350473	NM_011283	0	-
chr1	4562824	4562858	1	+	chr1	4481008	4486494	NM_011441	0	-
chr1	5120005	5120039	1	-	chr1	5073253	5152630	NM_133826	0	+
chr1	5493224	5493258	1	+	chr1	4334223	4350473	NM_011283	0	-
chr1	5493224	5493258	1	+	chr1	4481008	4486494	NM_011441	0	-
chr1	5493224	5493258	1	+	chr1	4764014	4775768	NM_025300	0	-
chr1	5493224	5493258	1	+	chr1	4797973	4836816	NM_008866	0	+
I expect closestBed to search the closest feature up/downstream, instead I get a list of features from the farthest to the closest (in abs(dist)). I'm a bit puzzled :-)
dawe is offline   Reply With Quote
Old 11-19-2009, 05:33 AM   #3
quinlana
Senior Member
 
Location: Charlottesville

Join Date: Sep 2008
Posts: 119
Default

Hi Dawe,
You are rightfully puzzled...I was too. Your expectation of how it should behave is correct. Unfortunately, I injected a typo while modifying an unrelated piece of code prior to this release. A new version (2.3.1) has been posted which behaves as you would expect. I tested it with your sample data below and all appears well.

As for ties, these occur in two ways:

1) When there are two or more features in B that _overlap_ the same fraction of feature in A, by default both features in B are reported. By using the -t first or -t last, you can choose just one.

2) When there are two or more that while not overlapping a feature in A, are exactly the same distance from A (say 1Mb), both will be reported.

Sorry for the confusion.
Aaron
quinlana is offline   Reply With Quote
Old 11-19-2009, 06:30 AM   #4
dawe
Senior Member
 
Location: 4530'25.22"N / 915'53.00"E

Join Date: Apr 2009
Posts: 258
Default

I've got it (both the tie definition and the new version tarball!).
As you said, it works!
Thanks

d
dawe is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:16 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO