Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • intersect (actually: filter) a gtf file with coordinates from a bed-file

    -- SOLVED -- bedtools intersect

    hello,

    does someone know a script which filters a gtf-file with coordinates from a bed file.

    i have a bed-file with regions and want filter out all features from a gtf file which overlap (completely or largely) with this regions. is there a script or program? bedtools intersect does not work?

    i know, an awk-script would not be very difficult, but why re-invent the wheel...

    thank you,

    dietmar
    Last edited by dietmar13; 04-25-2013, 09:37 PM.

  • #2
    Originally posted by dietmar13 View Post
    bedtools intersect does not work?
    Why do you say this? Seems like bedtools intersect with the '-v' option is exactly what you are looking for.

    Comment


    • #3
      @kmcarr

      thank you - you are right. i was misled by all the examples where only bed and bam files were used for bedtools intersect examples...

      dietmar

      Comment


      • #4
        Hi,

        I am comparing 2 different files. 1st file has 113 entries and the 2nd one has 88 entries.
        I use the following command to get the differences
        intersectBed -v -a 1.bed -b 2.bed or
        intersectBed -v -wa -wb -a 1.bed -b 2.bed

        But it shows that only 3 entries don't match in both the cases which is false.
        Does anyone has the idea why?

        Thanks

        Comment


        • #5
          Originally posted by vishal.rossi View Post
          I am comparing 2 different files. [...] I use the following command to get the differences [...] only 3 entries don't match
          What are you looking for exactly?
          IntersectBed with the "-v" parameter will show you the intervals from "1.bed" that have nothing in common with the ones in "2.bed". Entries from "2.bed" are not supposed to be reported.
          Also, one common nucleotide is enough by default to define an overlap between two intervals. For a more stringent criteria you might want to consider "-f" and "-r".

          Comment


          • #6
            Another option is BEDOPS bedops, which does set operations on BED data, and BEDOPS gtf2bed, which does a lossless conversion of GTF data into BED format, which can be used with BEDOPS tools.

            Let's assume that your regions-of-interest are in a file called myRegions.bed and your GTF-formatted annotations are in a file called myAnnotations.gtf.

            First, we sort myRegions.bed:

            $ sort-bed myRegions.bed > mySortedRegions.bed

            Next, we convert the annotations to BED format:

            $ gtf2bed < myAnnotations.gtf > myAnnotations.bed

            Finally, we apply a --not-element-of set operation to show elements of the annotations file which do not overlap mySortedRegions.bed, if there is one or more bases of overlap (i.e., any overlap at all):

            $ bedops --not-element-of -1 myAnnotations.bed mySortedRegions.bed > myAnswer.bed

            As the gtf2bed conversion step was lossless, it is easy to convert myAnswer.bed back to GTF:

            $ awk '{print $1"\t"$7"\t"$8"\t"($2+1)"\t"$3"\t"$5"\t"$6"\t"$9"\t"(substr($0, index($0,$10)))}' myAnswer.bed > myAnswer.gtf
            Last edited by AlexReynolds; 05-21-2013, 12:05 PM.

            Comment


            • #7
              To intersect coordinates

              Just use GFF-Intersector

              An R program capable of intersecting .GFF files and large files containg genomic co-ordinates and visualising the genome wide data. - PriceJon/GFF_Intersector


              it can intersect GFF files with multiple other coordinates!! have you got R? if so just 2 commands and you don't have to worry about the visualisation issue

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin


                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                Yesterday, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              45 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              46 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              39 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              55 views
              0 likes
              Last Post seqadmin  
              Working...
              X