Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • intersect (actually: filter) a gtf file with coordinates from a bed-file

    -- SOLVED -- bedtools intersect

    hello,

    does someone know a script which filters a gtf-file with coordinates from a bed file.

    i have a bed-file with regions and want filter out all features from a gtf file which overlap (completely or largely) with this regions. is there a script or program? bedtools intersect does not work?

    i know, an awk-script would not be very difficult, but why re-invent the wheel...

    thank you,

    dietmar
    Last edited by dietmar13; 04-25-2013, 09:37 PM.

  • #2
    Originally posted by dietmar13 View Post
    bedtools intersect does not work?
    Why do you say this? Seems like bedtools intersect with the '-v' option is exactly what you are looking for.

    Comment


    • #3
      @kmcarr

      thank you - you are right. i was misled by all the examples where only bed and bam files were used for bedtools intersect examples...

      dietmar

      Comment


      • #4
        Hi,

        I am comparing 2 different files. 1st file has 113 entries and the 2nd one has 88 entries.
        I use the following command to get the differences
        intersectBed -v -a 1.bed -b 2.bed or
        intersectBed -v -wa -wb -a 1.bed -b 2.bed

        But it shows that only 3 entries don't match in both the cases which is false.
        Does anyone has the idea why?

        Thanks

        Comment


        • #5
          Originally posted by vishal.rossi View Post
          I am comparing 2 different files. [...] I use the following command to get the differences [...] only 3 entries don't match
          What are you looking for exactly?
          IntersectBed with the "-v" parameter will show you the intervals from "1.bed" that have nothing in common with the ones in "2.bed". Entries from "2.bed" are not supposed to be reported.
          Also, one common nucleotide is enough by default to define an overlap between two intervals. For a more stringent criteria you might want to consider "-f" and "-r".

          Comment


          • #6
            Another option is BEDOPS bedops, which does set operations on BED data, and BEDOPS gtf2bed, which does a lossless conversion of GTF data into BED format, which can be used with BEDOPS tools.

            Let's assume that your regions-of-interest are in a file called myRegions.bed and your GTF-formatted annotations are in a file called myAnnotations.gtf.

            First, we sort myRegions.bed:

            $ sort-bed myRegions.bed > mySortedRegions.bed

            Next, we convert the annotations to BED format:

            $ gtf2bed < myAnnotations.gtf > myAnnotations.bed

            Finally, we apply a --not-element-of set operation to show elements of the annotations file which do not overlap mySortedRegions.bed, if there is one or more bases of overlap (i.e., any overlap at all):

            $ bedops --not-element-of -1 myAnnotations.bed mySortedRegions.bed > myAnswer.bed

            As the gtf2bed conversion step was lossless, it is easy to convert myAnswer.bed back to GTF:

            $ awk '{print $1"\t"$7"\t"$8"\t"($2+1)"\t"$3"\t"$5"\t"$6"\t"$9"\t"(substr($0, index($0,$10)))}' myAnswer.bed > myAnswer.gtf
            Last edited by AlexReynolds; 05-21-2013, 12:05 PM.

            Comment


            • #7
              To intersect coordinates

              Just use GFF-Intersector

              An R program capable of intersecting .GFF files and large files containg genomic co-ordinates and visualising the genome wide data. - PriceJon/GFF_Intersector


              it can intersect GFF files with multiple other coordinates!! have you got R? if so just 2 commands and you don't have to worry about the visualisation issue

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Recent Advances in Sequencing Analysis Tools
                by seqadmin


                The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                Yesterday, 07:48 AM
              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Today, 06:57 AM
              0 responses
              9 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 07:17 AM
              0 responses
              13 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-02-2024, 08:06 AM
              0 responses
              19 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-30-2024, 12:17 PM
              0 responses
              22 views
              0 likes
              Last Post seqadmin  
              Working...
              X