Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Identify reads overlapping a junction in gff file

    Dear all,

    I have a gff annotation file with coordinates for an overlapping genomic feature, e.g.
    chr1 1-100
    chr1 100-250
    chr1 250-300
    chr1 300-500

    and an alignment file (bam/bed) with reads mapping to that chromosome. I need to extract reads that are specifically overlapping the junctions of that genomic feature. So, in a bed file like this,

    chr1 50-70
    chr1 90-110
    chr1 150-170
    chr1 245-255

    Only extract,
    chr1 90-110
    chr1 245-255

    Any existing tools that can help me do that. I am not well versed in bioinformatics so I can't write my own scripts but am able to run existing tools.

    Any ideas would be appreciated.

    Thanks

  • #2
    BEdtools intersect will do this, you need tab-separated input so:

    Code:
    sed 's/\-/\t/g' <gff_or_bed> | sed 's/\s */\t/g' > <new-gff_or_bed>
    should work for that

    Comment


    • #3
      Thanks for the prompt reply bruce01, but I thought that bedtools will report all the common overlap features between my gff and bam/bed. I only need to extract the reads which are overlapping the junctions of my genomic feature. So, discard any reads which are completely within the coordinates of a single feature.
      Am I missing something here?

      Comment


      • #4
        Oh, you want reads to sit in two concurrent features, I misread and thought you just wanted to be within one feature. No tools I know to do that, but thats because I never wanted to do that.

        What are you trying to find, fusion genes?

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        25 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        28 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        24 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        52 views
        0 likes
        Last Post seqadmin  
        Working...
        X