Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • intersect (actually: filter) a gtf file with coordinates from a bed-file

    -- SOLVED -- bedtools intersect

    hello,

    does someone know a script which filters a gtf-file with coordinates from a bed file.

    i have a bed-file with regions and want filter out all features from a gtf file which overlap (completely or largely) with this regions. is there a script or program? bedtools intersect does not work?

    i know, an awk-script would not be very difficult, but why re-invent the wheel...

    thank you,

    dietmar
    Last edited by dietmar13; 04-25-2013, 09:37 PM.

  • #2
    Originally posted by dietmar13 View Post
    bedtools intersect does not work?
    Why do you say this? Seems like bedtools intersect with the '-v' option is exactly what you are looking for.

    Comment


    • #3
      @kmcarr

      thank you - you are right. i was misled by all the examples where only bed and bam files were used for bedtools intersect examples...

      dietmar

      Comment


      • #4
        Hi,

        I am comparing 2 different files. 1st file has 113 entries and the 2nd one has 88 entries.
        I use the following command to get the differences
        intersectBed -v -a 1.bed -b 2.bed or
        intersectBed -v -wa -wb -a 1.bed -b 2.bed

        But it shows that only 3 entries don't match in both the cases which is false.
        Does anyone has the idea why?

        Thanks

        Comment


        • #5
          Originally posted by vishal.rossi View Post
          I am comparing 2 different files. [...] I use the following command to get the differences [...] only 3 entries don't match
          What are you looking for exactly?
          IntersectBed with the "-v" parameter will show you the intervals from "1.bed" that have nothing in common with the ones in "2.bed". Entries from "2.bed" are not supposed to be reported.
          Also, one common nucleotide is enough by default to define an overlap between two intervals. For a more stringent criteria you might want to consider "-f" and "-r".

          Comment


          • #6
            Another option is BEDOPS bedops, which does set operations on BED data, and BEDOPS gtf2bed, which does a lossless conversion of GTF data into BED format, which can be used with BEDOPS tools.

            Let's assume that your regions-of-interest are in a file called myRegions.bed and your GTF-formatted annotations are in a file called myAnnotations.gtf.

            First, we sort myRegions.bed:

            $ sort-bed myRegions.bed > mySortedRegions.bed

            Next, we convert the annotations to BED format:

            $ gtf2bed < myAnnotations.gtf > myAnnotations.bed

            Finally, we apply a --not-element-of set operation to show elements of the annotations file which do not overlap mySortedRegions.bed, if there is one or more bases of overlap (i.e., any overlap at all):

            $ bedops --not-element-of -1 myAnnotations.bed mySortedRegions.bed > myAnswer.bed

            As the gtf2bed conversion step was lossless, it is easy to convert myAnswer.bed back to GTF:

            $ awk '{print $1"\t"$7"\t"$8"\t"($2+1)"\t"$3"\t"$5"\t"$6"\t"$9"\t"(substr($0, index($0,$10)))}' myAnswer.bed > myAnswer.gtf
            Last edited by AlexReynolds; 05-21-2013, 12:05 PM.

            Comment


            • #7
              To intersect coordinates

              Just use GFF-Intersector

              An R program capable of intersecting .GFF files and large files containg genomic co-ordinates and visualising the genome wide data. - PriceJon/GFF_Intersector


              it can intersect GFF files with multiple other coordinates!! have you got R? if so just 2 commands and you don't have to worry about the visualisation issue

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              8 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              8 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              49 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              66 views
              0 likes
              Last Post seqadmin  
              Working...
              X