Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • ReCOG - Read Counter accounting for Overlapping Genes

    Hi to all,

    here is a new tool (ReCOG) for calculating expression counts from strand-specific RNA-Seq paired-end reads from the Institute of Population Genetics in Vienna (Austria).

    ReCOG is a tool for counting strand-specific paired RNA-Seq reads mapped to a reference genome. Since genome annotations contain an increasing number of isoforms from a single gene, the concept of counting reads mapped solely to canonical exons is challenging and frequently not possible. Therefore, ReCOG pursues a different strategy. All read-pairs mapped between the start and the end of a gene are counted irrespective if they are annotated as exons or introns. Moreover, since expression counts cannot be unambiguously defined in regions where genes are overlapping, ReCOG does not count read-pairs mapped to these regions.


    Here is the link: https://code.google.com/p/recog/

    Enjoy
    Nicola

    Nicola Palmieri
    Doctoral student
    Institut für Populationsgenetik
    Vetmeduni Vienna

  • #2
    So, um, the title of this is "Read Counter accounting for Overlapping Genes", but in the description of your program, I see this:
    since expression counts cannot be unambiguously defined in regions where genes are overlapping, ReCOG does not count read-pairs mapped to these regions.
    Accounting for things by not counting them is a bit of an oxymoron. I would advise you to either change the name of the program, or change that description to be a bit more in tune with the program name.

    Also, the description of this algorithm seems to be similar to HTSeq-count in union mode:



    Which is itself a toy demonstration of what can be done with a little bit of python programming in combination with HTSeq:



    So... I notice that you're using pysam (from the looks of the files in the archive) just like HTSeq. What differentiates your program from HTSeq / HTSeq-count?

    I also notice you're including a sample BAM file to test out the program, which makes it a 77MB download[!], rather than something a bit closer to HTSeq's 350kb installer. You should probably change that, and have a script do the download (if desired) after ReCOG is installed.

    Comment


    • #3
      @David Eccles: It's like you read my mind.

      I also wonder what the difference would be to just munging the annotation file such that it contains only 5' and 3' most bounds and then using htseq-count.

      Comment


      • #4
        Dear David,

        Thanks for your remarks and questions. We will improve the description of the ReCOG script!

        Originally posted by gringer View Post
        So, um, the title of this is "Read Counter accounting for Overlapping Genes", but in the description of your program, I see this:

        Accounting for things by not counting them is a bit of an oxymoron. I would advise you to either change the name of the program, or change that description to be a bit more in tune with the program name.
        You are absolutely right!

        Originally posted by gringer View Post
        Also, the description of this algorithm seems to be similar to HTSeq-count in union mode:



        Which is itself a toy demonstration of what can be done with a little bit of python programming in combination with HTSeq:



        So... I notice that you're using pysam (from the looks of the files in the archive) just like HTSeq. What differentiates your program from HTSeq / HTSeq-count?
        The concept of HTSeq and ReCOG doesn't differ so much. First I applied HTSeq-count and I faced some problems:
        HTSeq-count uses SAM files which take a lot of space on the HD. ReCOG uses BAM files.
        When the annotetion of a genome is not so advanced - so it contains hundreds of "chromosomes" (contigs) - HTSeq just doesn't work. (At least for the D.simulans annotations of FlyBase.)
        I found some cases when HTSeq gave different counts than it should be (I checked this with IGV Viewer). I discussed these problems with other researchers and they also found some examples when HTSeq gave wrong count results.

        Originally posted by gringer View Post
        I also notice you're including a sample BAM file to test out the program, which makes it a 77MB download[!], rather than something a bit closer to HTSeq's 350kb installer. You should probably change that, and have a script do the download (if desired) after ReCOG is installed.
        This is also a useful advise!

        Bests,
        Eszter

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          Yesterday, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        58 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        54 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        45 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        55 views
        0 likes
        Last Post seqadmin  
        Working...
        X