SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
Removing overlapping genes from annotation for RNAseq read count DRAT Bioinformatics 2 04-11-2014 04:53 AM
reads mapping to overlapping genes? beliefbio Bioinformatics 3 04-11-2014 01:18 AM
Dealing with overlapping read pairs Jeremy37 Bioinformatics 7 03-24-2013 04:03 PM
Overlapping and non-Overlapping pair-end reads with Tophat senpeng Illumina/Solexa 4 10-16-2011 07:43 PM

Reply
 
Thread Tools
Old 03-12-2014, 05:29 AM   #1
alexnico84
Junior Member
 
Location: Wien

Join Date: Jun 2009
Posts: 3
Default ReCOG - Read Counter accounting for Overlapping Genes

Hi to all,

here is a new tool (ReCOG) for calculating expression counts from strand-specific RNA-Seq paired-end reads from the Institute of Population Genetics in Vienna (Austria).

ReCOG is a tool for counting strand-specific paired RNA-Seq reads mapped to a reference genome. Since genome annotations contain an increasing number of isoforms from a single gene, the concept of counting reads mapped solely to canonical exons is challenging and frequently not possible. Therefore, ReCOG pursues a different strategy. All read-pairs mapped between the start and the end of a gene are counted irrespective if they are annotated as exons or introns. Moreover, since expression counts cannot be unambiguously defined in regions where genes are overlapping, ReCOG does not count read-pairs mapped to these regions.


Here is the link: https://code.google.com/p/recog/

Enjoy
Nicola

Nicola Palmieri
Doctoral student
Institut für Populationsgenetik
Vetmeduni Vienna
alexnico84 is offline   Reply With Quote
Old 04-11-2014, 08:00 AM   #2
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 838
Default

So, um, the title of this is "Read Counter accounting for Overlapping Genes", but in the description of your program, I see this:
Quote:
since expression counts cannot be unambiguously defined in regions where genes are overlapping, ReCOG does not count read-pairs mapped to these regions.
Accounting for things by not counting them is a bit of an oxymoron. I would advise you to either change the name of the program, or change that description to be a bit more in tune with the program name.

Also, the description of this algorithm seems to be similar to HTSeq-count in union mode:

http://www-huber.embl.de/users/ander...unt.html#count

Which is itself a toy demonstration of what can be done with a little bit of python programming in combination with HTSeq:

http://www-huber.embl.de/users/ander...reads-by-genes

So... I notice that you're using pysam (from the looks of the files in the archive) just like HTSeq. What differentiates your program from HTSeq / HTSeq-count?

I also notice you're including a sample BAM file to test out the program, which makes it a 77MB download[!], rather than something a bit closer to HTSeq's 350kb installer. You should probably change that, and have a script do the download (if desired) after ReCOG is installed.
gringer is offline   Reply With Quote
Old 04-11-2014, 09:56 AM   #3
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

@David Eccles: It's like you read my mind.

I also wonder what the difference would be to just munging the annotation file such that it contains only 5' and 3' most bounds and then using htseq-count.
dpryan is offline   Reply With Quote
Old 04-24-2014, 04:56 AM   #4
eszter.ari
Junior Member
 
Location: Vienna

Join Date: Dec 2012
Posts: 8
Default

Dear David,

Thanks for your remarks and questions. We will improve the description of the ReCOG script!

Quote:
Originally Posted by gringer View Post
So, um, the title of this is "Read Counter accounting for Overlapping Genes", but in the description of your program, I see this:

Accounting for things by not counting them is a bit of an oxymoron. I would advise you to either change the name of the program, or change that description to be a bit more in tune with the program name.
You are absolutely right!

Quote:
Originally Posted by gringer View Post
Also, the description of this algorithm seems to be similar to HTSeq-count in union mode:

http://www-huber.embl.de/users/ander...unt.html#count

Which is itself a toy demonstration of what can be done with a little bit of python programming in combination with HTSeq:

http://www-huber.embl.de/users/ander...reads-by-genes

So... I notice that you're using pysam (from the looks of the files in the archive) just like HTSeq. What differentiates your program from HTSeq / HTSeq-count?
The concept of HTSeq and ReCOG doesn't differ so much. First I applied HTSeq-count and I faced some problems:
HTSeq-count uses SAM files which take a lot of space on the HD. ReCOG uses BAM files.
When the annotetion of a genome is not so advanced - so it contains hundreds of "chromosomes" (contigs) - HTSeq just doesn't work. (At least for the D.simulans annotations of FlyBase.)
I found some cases when HTSeq gave different counts than it should be (I checked this with IGV Viewer). I discussed these problems with other researchers and they also found some examples when HTSeq gave wrong count results.

Quote:
Originally Posted by gringer View Post
I also notice you're including a sample BAM file to test out the program, which makes it a 77MB download[!], rather than something a bit closer to HTSeq's 350kb installer. You should probably change that, and have a script do the download (if desired) after ReCOG is installed.
This is also a useful advise!

Bests,
Eszter
eszter.ari is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:50 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO