Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How does one find upstream regions of a gene from BED files

    Hi,

    I obtained a list of refseq genes from UCSC genome in the bed file format. For my analysis I would like to see if a particular region falls within a few bp (say 1000bp) upstream to the gene as well.

    Is it as simple as just adding 1000bp to the gene starting position (e.g., below) -> 8378298 + 1000 = 8379298? I'm assuming there is a catch here. Can anyone tell me the right way to do it?
    e.g: chrI 8378298 8390022 NM_001129046 0 - 8378298 8390022 0 8 123,103,110,116,65,69,124,113, 0,832,1401,2025,9723,9836,10481,11611,
    Last edited by anibhax; 06-03-2013, 11:53 AM. Reason: Proper formatting

  • #2
    You could use the bedops program, part of the BEDOPS suite. It has an asymmetric --range operand that you can use to search for set overlaps with adjusted coordinate ranges:

    http://code.google.com/p/bedops/wiki...ange_(--range)

    For example:

    $ bedops --range -1000:0 --element-of -1 genes.bed regions.bed > answer.bed

    The file answer.bed will contain elements of genes.bed which overlap the coordinates of elements of regions.bed. In this example, overlap is defined as any base from 1000 bases upstream of the start position of the gene element to the stop position of the gene element.

    The main requirement for input to BEDOPS tools is that it is sorted. Use BEDOPS sort-bed for this purpose, e.g.:

    $ sort-bed unsorted-genes.bed > genes.bed
    $ sort-bed unsorted-regions.bed > regions.bed


    Sorting is quick and allows BEDOPS tools to work fast and with a low memory profile. You only have to sort inputs once. Any output from BEDOPS tools is sorted, in case you want to do further downstream processing of it.

    For more details, see: http://code.google.com/p/bedops/wiki/sortBed
    Last edited by AlexReynolds; 06-03-2013, 02:42 PM.

    Comment


    • #3
      Originally posted by AlexReynolds View Post
      You could use the bedops program, part of the BEDOPS suite. It has an asymmetric --range operand that you can use to search for set overlaps with adjusted coordinates:

      http://code.google.com/p/bedops/wiki...ange_(--range)

      For example:

      $ bedops --range -1000:0 --element-of -1 genes.bed regions.bed > answer.bed

      The file answer.bed will contain elements of genes.bed which overlap the coordinates of elements of regions.bed. In this example, overlap is defined as any base from 1000 bases upstream of the start position of the gene element to the stop position of the gene element.

      The main requirement for input to BEDOPS tools is that it is sorted. Use BEDOPS sort-bed for this purpose, e.g.:

      $ sort-bed unsorted-genes.bed > genes.bed
      $ sort-bed unsorted-regions.bed > regions.bed


      Sorting is quick and allows BEDOPS tools to work fast and with a low memory profile. You only have to sort inputs once. Any output from BEDOPS tools is sorted, in case you want to do further downstream processing of it.

      For more details, see: http://code.google.com/p/bedops/wiki/sortBed
      Hi Alex,

      Thanks for the reply. I have used bedops and I really find it good for this kind of analysis. But I'm not sure if I'm right conceptually to do that.

      Comment


      • #4
        It depends on your experiment. If you're looking to investigate genes that are proximal to a set of regions, for example (where proximal elements are defined as 1000 bases upstream of the gene) then this is one way to quickly generate that set result.

        Comment


        • #5
          I'm looking for regulatory regions like promoters or TF binding sites. Does this still hold?

          Comment


          • #6
            If you're looking for promoters or TF binding sites that are proximal to your gene, where you define proximal as some sensible value (say, between 1000-2500 bases; this depends on your work) then this would work fine.

            Unique distal elements — elements further away — could be located with three bedops set operations, one to locate the proximal elements and a second to locate distal elements with a wider upstream range.

            As this second set includes the first set's proximal elements, a third operation calculates the set difference between the proximal and distal elements to get elements unique to the distal set, perhaps using the --not-element-of operand between the first two sets. Make sense?

            If you need information about which promoters and TF binding sites associate with a specific gene, then another way to approach this is with the bedmap tool, which does things a little differently:

            $ bedops --range -1000:0 --everything genes.bed \
            | bedmap --echo --echo-map-id-uniq --delim '\t' - promoters.bed \
            | bedops --range 1000:0 --everything - \
            > answer.bed


            (Note that we first adjust the ranges of gene elements before doing the mapping step against promoters, before passing that result again to bedops to reverse the range adjustment and get back the gene's original coordinates.)

            This file answer.bed contains both the gene and a list of unique IDs for overlapping proximal promoters that associate with that gene. This could also be used to return a list of IDs of motif binding sites. And if you want the entire mapped element, not just the ID, use --echo-map in place of --echo-map-id-uniq.

            This result is different from what comes out of bedops, which just does set calculations. The bedmap tool will show associations between a reference element (here, a gene) and anything it can be mapped to (here, one or more proximal promoters).
            Last edited by AlexReynolds; 06-03-2013, 03:02 PM.

            Comment


            • #7
              I think it does. Thanks for the help!

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 03-27-2024, 06:37 PM
              0 responses
              12 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-27-2024, 06:07 PM
              0 responses
              11 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              53 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              69 views
              0 likes
              Last Post seqadmin  
              Working...
              X