SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
find overlaps/common in multiple bed file epi Bioinformatics 11 02-05-2013 06:47 AM
Finding *new* regions of DNA in genome assemblies green tree De novo discovery 5 02-20-2012 03:19 PM
Finding regions of enriched sequence tags droog_22 Bioinformatics 0 01-09-2012 08:09 AM

Reply
 
Thread Tools
Old 12-05-2012, 05:09 AM   #1
a_mt
Member
 
Location: C:/Program files/Google/Chrome

Join Date: Jul 2012
Posts: 34
Post finding common genomic regions from multiple (>2) BED files

Hi all,

I have 6 bed files and I am looking for common genomic regions among all the 6 files.
Is there any tool to do the same?? Bedtools only takes 2 files at a time. Is there any way to do this in one go?? I am guessing Buioconductor-GRanges can achieve this, but I am not sure.

At present I am doing it pairwise using bedtools, which is really hectic. To begin with there will be 10 comparisions.

any suggestions ??

Thanks all.

Last edited by a_mt; 12-05-2012 at 05:24 AM. Reason: Solved : just found multiIntersectBed option :)
a_mt is offline   Reply With Quote
Old 12-05-2012, 05:35 AM   #2
pbluescript
Senior Member
 
Location: Boston

Join Date: Nov 2009
Posts: 224
Default

You can do it using piping.

intersectBed -a 1.bed -b 2.bed | intersectBed -a stdin -b 3.bed | ... and so on.
pbluescript is offline   Reply With Quote
Old 12-05-2012, 07:52 AM   #3
gsgs
Senior Member
 
Location: germany

Join Date: Oct 2009
Posts: 140
Default

how long are the files
how are they separated
how much memory has the computer
can I just count common 15-substrings
gsgs is offline   Reply With Quote
Old 12-06-2012, 02:03 AM   #4
pallevillesen
Member
 
Location: Bioinformatics Research Center, Aarhus University, Denmark

Join Date: May 2012
Posts: 19
Default

If you're only looking for the intersection of all 6 - then you just go

cat 1 | intersect 2 stdin | intersect 3 stdin |intersect 4 stdin |intersect 5 stdin | intersect 6 stdin >out

can't really get easier (or faster).

If you want a "venn diagram" of all 6 - then you have a lot of comparisons to do

Probably easier to combine and unique all of it - and add information to each position of which files it is present in - then you can query it in R og awk or something else....

cat *.bed | sort -k1,1 -k2,2n |uniq >all.bed

Then bedtools intersect -loj -a file1.bed -b all.bed - do this for all 6 files and keep that information (-loj = left outer join) - if there is a overlap it will add that info - otherwise it will add -1.

Then you must remove some unwanted columns etc. - but it's a start.
pallevillesen is offline   Reply With Quote
Old 01-31-2013, 04:00 PM   #5
sjneph
Junior Member
 
Location: seattle, wa

Join Date: Jan 2013
Posts: 2
Default BEDOPS works directly with any number of files

bedops --intersect f1.bed f2.bed f3.bed f4.bed f5.bed f6.bed > answer.bed

(or even more consicely: bedops --intersect *.bed > final-answer)

As you can see, this program usage is more concise than anything else you could do. It turns out to be more efficient than any other approach out there too (both in time and memory).

You can pass any number of files to the bedops program directly. It doesn't read everything into memory, unlike other tool suites (those other suites actually require 2x their usual memory overhead too once you start using pipes as suggested above). Memory overhead is almost nothing for bedops (say < 20 MB), no matter how many or how big your input files get. And the program will run significantly faster than anything else out there right now.

The only requirement is that each of your files is pre-sorted. Yet, every output result produced by bedops is guaranteed to be sorted for you, so any results can be used in the future and you never need to sort them.

Pre-built binaries and source for the BEDOPS suite are available at http://code.google.com/p/bedops/ .

To sort files, run them through the sort-bed program:
sort-bed file1.bed > f1.bed

You'll find that sort-bed happens to sort files faster than any other BED sorting program out there, as well. Our motto is simple: sort (at most) one time and run efficiently forever afterwards. Alternative suites do the equivalent of sorting every BED file every single time you call a program.

As a final remark, doing the intersection between various sets is pretty easy, and you can do it in a pairwise fashion with pipes as shown in other posts above, which seems kind of cute. While that approach is not as efficient in memory nor time as a simple bedops call, it still seems nice on the surface.

No such cute solution exists with pipes if you change the problem very slightly - instead, give me all regions specific to exactly 1 file. Try to build up a solution with pairwise set-difference operations with no (or few) intermediates files or fifos. See what happens when you go from 2 BED files to 3. Now, go to 4 and beyond (hint, it ain't good).

However, this symmetric difference problem is easy for bedops. It's 1 command, regardless of the number of input files, just as in the intersection case.

bedops --symmdiff f1.bed f2.bed f3.bed f4.bed f5.bed f6.bed > symmdiff-answer.bed

This is concise and just as efficient as the intersection case. The bedops program was built from the ground up to work efficiently, both in time and memory, with any number of sorted input files at once.

Last edited by sjneph; 01-31-2013 at 11:59 PM.
sjneph is offline   Reply With Quote
Reply

Tags
bedtools, bioconductor

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:01 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO