SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
from excel file to genomic features ohsu Bioinformatics 5 01-31-2012 10:33 PM
Dmel Genomic Features files chrishawk Bioinformatics 1 11-23-2010 10:00 AM
BEDTools v2.8: VCF, split-alignments, new tools quinlana Bioinformatics 11 08-06-2010 09:19 AM
BEDTools: A flexible suite of utilities for comparing genomic features nilshomer Literature Watch 5 02-01-2010 09:36 AM
BEDTools: new tools / support for paired-end features. quinlana Bioinformatics 3 11-19-2009 05:30 AM

Reply
 
Thread Tools
Old 08-16-2010, 01:45 PM   #1
quinlana
Senior Member
 
Location: Charlottesville

Join Date: Sep 2008
Posts: 119
Default BEDTools v2.9 - new tools/features

Hi all,
I just posted Version 2.9.0. The details of the release are below. Highlights include a new unionBedGraphs tool, a "per-base" coverage option for coverageBed, a "distance" option for closestBed, and multi-column operations for groupBy.

http://bedtools.googlecode.com/files....v2.9.0.tar.gz

Best,
Aaron

=== New tools ===
1. unionBedGraphs. This is a powerful new tool contributed by Assaf Gordon from CSHL. It will combine/merge multiple BEDGRAPH files into a single file, thus allowing comparisons of coverage (or any text-value) across multiple samples. The example below illustrates how to compare coverage across three different BEDGRAPH files.
Code:
 $ cat 1.bg
 chr1	1000	1500	10
 chr1	2000	2100	20

 $ cat 2.bg
 chr1	900	1600	60
 chr1	1700	2050	50

 $ cat 3.bg
 chr1	1980	2070	80
 chr1	2090	2100	20

 $ unionBedGraphs -header -i 1.bg 2.bg 3.bg -names WT-1 WT-2 KO-1
 chrom	start	end	WT-1	WT-2	KO-1
 chr1	900	1000	0	60	0
 chr1	1000	1500	10	60	0
 chr1	1500	1600	0	60	0
 chr1	1700	1980	0	50	0
 chr1	1980	2000	0	50	80
 chr1	2000	2050	20	50	80
 chr1	2050	2070	20	0	80
 chr1	2070	2090	20	0	0
 chr1	2090	2100	20	0	20

=== New features ===

1. The "groupBy" tool now allows one to operate on multiple columns for each group. For example:
Code:
$ cat ex1.out
chr1	10	20	A	chr1	15	25	B.1	1000
chr1	10	20	A	chr1	25	35	B.2	10000

$ groupBy -i ex1.out -grp 1,2,3,4 -opCols 8,9 -ops collapse,mean
chr1	10	20	A	B.1,B.2,	550
2. New "distance feature" (-d) added to closestBed by Erik Arner. In addition to finding the closest feature to each feature in A, the -d option will report the distance to the closest feature in B. Overlapping features have a distance of 0.
3. New "per base depth feature" (-d) added to coverageBed. This reports the per base coverage (1-based) of each feature in file B based on the coverage of features found in file A. For example, this could report the per-base depth of sequencing reads (-a) across each capture target (-b).

Best,
Aaron

Last edited by quinlana; 08-16-2010 at 01:46 PM. Reason: superfluous
quinlana is offline   Reply With Quote
Old 08-17-2010, 08:25 AM   #2
quinlana
Senior Member
 
Location: Charlottesville

Join Date: Sep 2008
Posts: 119
Default

Hi all,
I've received a few emails expressing confusion over the utility and limitations of the new "groupBy" tool. First, it is not limited to processing output from BEDTools: it will work on any tab-delimited file or stream. To illustrate this and to fulfill requests for additional examples, the command below is used to compute the mean and standard deviation of all sequence libraries that are present in a BAM file containing multiple libraries. This example makes an assumption (in the interest of clarity) that each read tracks the library from which it came in the read group (RG) tag and that this tag is the 12th column in the SAM output.

I hope this helps and I apologize that things weren't expressed with more clarity earlier.

Aaron
Code:
##########################################################################
# Goal: Compute the mean and stdev for each sequencing library (RG tag)
# Steps:
# Line 1 (samtools) : extract all properly-paired reads
# Line 2 (awk):       print the RG/library and ISIZE (positive ISIZE only)
# Line 3 (sort):      sort the output by RG/library
# Line 4 (groupBy):   compute the mean & stdev for each library
###########################################################################
$ samtools view -f 0x2 aln.multipleLibraries.bam | \
    awk '{if ($9>0) {print $12"\t"$9}}' | \
    sort -k1,1 | \
    groupBy -i stdin -grp 1 -opCols 2,2 -ops mean,stdev

# library	mean		stdev
RG:Z:libA	319.5959	32.86841
RG:Z:libB	389.8465	32.60053
RG:Z:libC	329.1906	32.86142
RG:Z:libD	318.8107	33.33372
RG:Z:libE	359.0431	33.34611
RG:Z:libF	320.4461	32.79852
RG:Z:libG	399.0043	32.98773
RG:Z:libH	329.6738	33.15160
quinlana is offline   Reply With Quote
Old 04-14-2011, 09:50 AM   #3
Yilong Li
Member
 
Location: WTSI

Join Date: Dec 2010
Posts: 41
Default

Thanks for the amazing program, I've been looking for such a program for a very long time!

One question, will bedtools perform faster (esp. coverageBed or intersectBed), if the input BED or BAM files are sorted or does it matter?
Yilong Li is offline   Reply With Quote
Old 04-14-2011, 10:41 AM   #4
quinlana
Senior Member
 
Location: Charlottesville

Join Date: Sep 2008
Posts: 119
Default

Quote:
Originally Posted by Yilong Li View Post
Thanks for the amazing program, I've been looking for such a program for a very long time!

One question, will bedtools perform faster (esp. coverageBed or intersectBed), if the input BED or BAM files are sorted or does it matter?
Currently, sorting makes no difference for intersect or coverage.
quinlana is offline   Reply With Quote
Old 06-22-2011, 04:42 PM   #5
Adriano
Junior Member
 
Location: Brazil

Join Date: Oct 2010
Posts: 1
Default

Hi Aaron,

Thank you very much for your program. I am starting to use it, and for me sounds very well documented and quick to get used to the features.

I have one major observation. When you download a GFF file from NCBI Genome, for example, you get the first feature called "source" as being the whole chromosome size, one line like:

NC_008405.1 RefSeq source 1 27566993

and this causes the intersectBed to cross all the short reads with this feature. But actually, the intersections should be only with the features like "gene", "exon", "etc".

To avoid this problem, I need to edit the GFF file erasing this "source" feature out.

I hope you can have a look on this issue and improve even more you fantastic BEDTools.

Cheers,

Adriano
Adriano is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:42 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO