Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • finding common genomic regions from multiple (>2) BED files

    Hi all,

    I have 6 bed files and I am looking for common genomic regions among all the 6 files.
    Is there any tool to do the same?? Bedtools only takes 2 files at a time. Is there any way to do this in one go?? I am guessing Buioconductor-GRanges can achieve this, but I am not sure.

    At present I am doing it pairwise using bedtools, which is really hectic. To begin with there will be 10 comparisions.

    any suggestions ??

    Thanks all.
    Last edited by a_mt; 12-05-2012, 05:24 AM. Reason: Solved : just found multiIntersectBed option :)

  • #2
    You can do it using piping.

    intersectBed -a 1.bed -b 2.bed | intersectBed -a stdin -b 3.bed | ... and so on.

    Comment


    • #3
      how long are the files
      how are they separated
      how much memory has the computer
      can I just count common 15-substrings

      Comment


      • #4
        If you're only looking for the intersection of all 6 - then you just go

        cat 1 | intersect 2 stdin | intersect 3 stdin |intersect 4 stdin |intersect 5 stdin | intersect 6 stdin >out

        can't really get easier (or faster).

        If you want a "venn diagram" of all 6 - then you have a lot of comparisons to do

        Probably easier to combine and unique all of it - and add information to each position of which files it is present in - then you can query it in R og awk or something else....

        cat *.bed | sort -k1,1 -k2,2n |uniq >all.bed

        Then bedtools intersect -loj -a file1.bed -b all.bed - do this for all 6 files and keep that information (-loj = left outer join) - if there is a overlap it will add that info - otherwise it will add -1.

        Then you must remove some unwanted columns etc. - but it's a start.

        Comment


        • #5
          BEDOPS works directly with any number of files

          bedops --intersect f1.bed f2.bed f3.bed f4.bed f5.bed f6.bed > answer.bed

          (or even more consicely: bedops --intersect *.bed > final-answer)

          As you can see, this program usage is more concise than anything else you could do. It turns out to be more efficient than any other approach out there too (both in time and memory).

          You can pass any number of files to the bedops program directly. It doesn't read everything into memory, unlike other tool suites (those other suites actually require 2x their usual memory overhead too once you start using pipes as suggested above). Memory overhead is almost nothing for bedops (say < 20 MB), no matter how many or how big your input files get. And the program will run significantly faster than anything else out there right now.

          The only requirement is that each of your files is pre-sorted. Yet, every output result produced by bedops is guaranteed to be sorted for you, so any results can be used in the future and you never need to sort them.

          Pre-built binaries and source for the BEDOPS suite are available at http://code.google.com/p/bedops/ .

          To sort files, run them through the sort-bed program:
          sort-bed file1.bed > f1.bed

          You'll find that sort-bed happens to sort files faster than any other BED sorting program out there, as well. Our motto is simple: sort (at most) one time and run efficiently forever afterwards. Alternative suites do the equivalent of sorting every BED file every single time you call a program.

          As a final remark, doing the intersection between various sets is pretty easy, and you can do it in a pairwise fashion with pipes as shown in other posts above, which seems kind of cute. While that approach is not as efficient in memory nor time as a simple bedops call, it still seems nice on the surface.

          No such cute solution exists with pipes if you change the problem very slightly - instead, give me all regions specific to exactly 1 file. Try to build up a solution with pairwise set-difference operations with no (or few) intermediates files or fifos. See what happens when you go from 2 BED files to 3. Now, go to 4 and beyond (hint, it ain't good).

          However, this symmetric difference problem is easy for bedops. It's 1 command, regardless of the number of input files, just as in the intersection case.

          bedops --symmdiff f1.bed f2.bed f3.bed f4.bed f5.bed f6.bed > symmdiff-answer.bed

          This is concise and just as efficient as the intersection case. The bedops program was built from the ground up to work efficiently, both in time and memory, with any number of sorted input files at once.
          Last edited by sjneph; 01-31-2013, 11:59 PM.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM
          • seqadmin
            Techniques and Challenges in Conservation Genomics
            by seqadmin



            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

            Avian Conservation
            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
            03-08-2024, 10:41 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 06:37 PM
          0 responses
          12 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Yesterday, 06:07 PM
          0 responses
          10 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-22-2024, 10:03 AM
          0 responses
          52 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-21-2024, 07:32 AM
          0 responses
          68 views
          0 likes
          Last Post seqadmin  
          Working...
          X