Core question: I did a de novo transcriptome assembly with cufflinks for rat based on RNA-Seq data. There are a couple of 1000 transcripts not overlapping with annotated genes and I would like to divide these into putatively coding and putatively non-coding RNA, using PhyloCSF.
I find it difficult to prepare the input files for PhyloCSF and wonder what would be a straightforward way to do this?
What I already tried: I think as input I need a multi-alignment of the orthologous loci, and the sequence for rat needs to be ungapped.
I would like to avoid doing my own multi-genome alignment if at all possible and searched UCSC. There http://hgdownload.cse.ucsc.edu/golde...n4/multiz9way/ I found that they already have multi-genome alignment for the rat genome against:
PhyloCSF does not offer this phylogeny and according to https://github.com/mlin/PhyloCSF/wiki/ it is not directly possible to prepare my own phylogeny for this. However, PhyloCSF does support the 29 mammals phylogeny.
So if I want to go with this approach, I would need to:
Or, maybe this approach is too convoluted anyway? Any help or suggestions for a better strategy would be much appreciated!
P.S.: I am using rn4 coordinates.
I find it difficult to prepare the input files for PhyloCSF and wonder what would be a straightforward way to do this?
What I already tried: I think as input I need a multi-alignment of the orthologous loci, and the sequence for rat needs to be ungapped.
I would like to avoid doing my own multi-genome alignment if at all possible and searched UCSC. There http://hgdownload.cse.ucsc.edu/golde...n4/multiz9way/ I found that they already have multi-genome alignment for the rat genome against:
- mouse (Feb 2006, mm8)
- human (Mar 2006, hg18)
- dog (May 2005, canFam2)
- cow (Mar 2005, bosTau2)
- opossum (Jan 2006, monDom4)
- chicken (Feb 2004, galGal2)
- frog (Oct 2004, xenTro1)
- zebrafish (May 2005, danRer3)
PhyloCSF does not offer this phylogeny and according to https://github.com/mlin/PhyloCSF/wiki/ it is not directly possible to prepare my own phylogeny for this. However, PhyloCSF does support the 29 mammals phylogeny.
So if I want to go with this approach, I would need to:
- remove zebrafish, frog, chicken, opossum (how?)
- make it so that the rat part is ungapped (is this even possible? how?)
- extract the sequence from the remaining multi-alignment, that correpsonds to the transcript which I want to test (how?)
Or, maybe this approach is too convoluted anyway? Any help or suggestions for a better strategy would be much appreciated!
P.S.: I am using rn4 coordinates.
Comment