![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Removing overlapping genes from annotation for RNAseq read count | DRAT | Bioinformatics | 2 | 04-11-2014 04:53 AM |
reads mapping to overlapping genes? | beliefbio | Bioinformatics | 3 | 04-11-2014 01:18 AM |
Dealing with overlapping read pairs | Jeremy37 | Bioinformatics | 7 | 03-24-2013 04:03 PM |
Overlapping and non-Overlapping pair-end reads with Tophat | senpeng | Illumina/Solexa | 4 | 10-16-2011 07:43 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: Wien Join Date: Jun 2009
Posts: 3
|
![]()
Hi to all,
here is a new tool (ReCOG) for calculating expression counts from strand-specific RNA-Seq paired-end reads from the Institute of Population Genetics in Vienna (Austria). ReCOG is a tool for counting strand-specific paired RNA-Seq reads mapped to a reference genome. Since genome annotations contain an increasing number of isoforms from a single gene, the concept of counting reads mapped solely to canonical exons is challenging and frequently not possible. Therefore, ReCOG pursues a different strategy. All read-pairs mapped between the start and the end of a gene are counted irrespective if they are annotated as exons or introns. Moreover, since expression counts cannot be unambiguously defined in regions where genes are overlapping, ReCOG does not count read-pairs mapped to these regions. Here is the link: https://code.google.com/p/recog/ Enjoy Nicola Nicola Palmieri Doctoral student Institut für Populationsgenetik Vetmeduni Vienna |
![]() |
![]() |
![]() |
#2 | |
David Eccles (gringer)
Location: Wellington, New Zealand Join Date: May 2011
Posts: 838
|
![]()
So, um, the title of this is "Read Counter accounting for Overlapping Genes", but in the description of your program, I see this:
Quote:
Also, the description of this algorithm seems to be similar to HTSeq-count in union mode: http://www-huber.embl.de/users/ander...unt.html#count Which is itself a toy demonstration of what can be done with a little bit of python programming in combination with HTSeq: http://www-huber.embl.de/users/ander...reads-by-genes So... I notice that you're using pysam (from the looks of the files in the archive) just like HTSeq. What differentiates your program from HTSeq / HTSeq-count? I also notice you're including a sample BAM file to test out the program, which makes it a 77MB download[!], rather than something a bit closer to HTSeq's 350kb installer. You should probably change that, and have a script do the download (if desired) after ReCOG is installed. |
|
![]() |
![]() |
![]() |
#3 |
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,480
|
![]()
@David Eccles: It's like you read my mind.
I also wonder what the difference would be to just munging the annotation file such that it contains only 5' and 3' most bounds and then using htseq-count. |
![]() |
![]() |
![]() |
#4 | |||
Junior Member
Location: Vienna Join Date: Dec 2012
Posts: 8
|
![]()
Dear David,
Thanks for your remarks and questions. We will improve the description of the ReCOG script! Quote:
Quote:
HTSeq-count uses SAM files which take a lot of space on the HD. ReCOG uses BAM files. When the annotetion of a genome is not so advanced - so it contains hundreds of "chromosomes" (contigs) - HTSeq just doesn't work. (At least for the D.simulans annotations of FlyBase.) I found some cases when HTSeq gave different counts than it should be (I checked this with IGV Viewer). I discussed these problems with other researchers and they also found some examples when HTSeq gave wrong count results. Quote:
Bests, Eszter |
|||
![]() |
![]() |
![]() |
Thread Tools | |
|
|