Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merge two gff3 files?

    I have two gff3 files, one of primary transcripts and one of alternative transcripts, that need to be merged into one file. Since the files must be interwoven (some gene IDs appear in both), simply concatenating the files won't do.

    Anyone had any experience doing such a thing? I have worked quite a bit with BioPerl but have no experience handling annotations. And feeling lazy this time -- would rather find a tool that already exists than roll my own.

    Thanks much!!
    Bob

  • #2
    Do the annotations really need to be interwoven? I know that it is convention but as far as I know it is not a requirement of the GFF. I think it would only matter if the particular software using the file as input requires it.

    Comment


    • #3
      Not sure what the spec says. I tried to concatenate the two files and use that, but the program I submitted the concatenated file to barfed with the error 'duplicate gene id' (or something like that). So i surmised that it expected to see the gene id defined only once, indicating that the data needs to be interleaved.

      I guess I could try and take a shortcut -- find all those gene ids that are 'duplicates', remove those 'gene' lines from the 2nd file, and concatenate. Probably about 10 minutes of work...

      -Bob

      Comment


      • #4
        How about BEDtools 'intersectbed'. It works with gff.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin


          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
          Yesterday, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        39 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        41 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        35 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        55 views
        0 likes
        Last Post seqadmin  
        Working...
        X