![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
where is the error in my input files? | shuang | Bioinformatics | 3 | 08-23-2011 02:23 AM |
input files for IMAGE | Maegwin | Bioinformatics | 4 | 04-22-2011 05:54 PM |
SVA input files | srd | Introductions | 0 | 03-16-2011 07:17 AM |
IMAGE input files | skingan | Genomic Resequencing | 0 | 07-29-2010 01:02 PM |
BWA - input files | Bruins | Bioinformatics | 2 | 07-07-2010 12:43 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Japan Join Date: Oct 2010
Posts: 52
|
![]()
Core question: I did a de novo transcriptome assembly with cufflinks for rat based on RNA-Seq data. There are a couple of 1000 transcripts not overlapping with annotated genes and I would like to divide these into putatively coding and putatively non-coding RNA, using PhyloCSF.
I find it difficult to prepare the input files for PhyloCSF and wonder what would be a straightforward way to do this? What I already tried: I think as input I need a multi-alignment of the orthologous loci, and the sequence for rat needs to be ungapped. I would like to avoid doing my own multi-genome alignment if at all possible and searched UCSC. There http://hgdownload.cse.ucsc.edu/golde...n4/multiz9way/ I found that they already have multi-genome alignment for the rat genome against:
So if I want to go with this approach, I would need to:
P.S.: I am using rn4 coordinates. |
![]() |
![]() |
![]() |
#2 | |
Member
Location: Houston Join Date: Apr 2009
Posts: 12
|
![]() Quote:
Most straightforward way is not to use phyloCSF. Instead, using PCAT, you only need the mRNA sequence or genome coordinates (which you already have if you already rebuild the transcriptome) http://code.google.com/p/cpat/ It's accurate , efficient and convenient. |
|
![]() |
![]() |
![]() |
#3 | |
Member
Location: boston Join Date: May 2012
Posts: 29
|
![]()
the software is not published yet, right?
is there a paper? Quote:
|
|
![]() |
![]() |
![]() |
#4 |
Member
Location: Houston Join Date: Apr 2009
Posts: 12
|
![]() |
![]() |
![]() |
![]() |
#5 |
Member
Location: boston Join Date: May 2012
Posts: 29
|
![]()
Thanks for your fast reply.
Is there a direct way to connect PCAT with cufflinks/cuffmerge/cuffcompare? As far as I can see, the 12 columns bed format required in your software has a different format from the gtf gene annotation file I downloaded from UCSC. What are meanings of columns 7-12? Also, the format is different from the output of cufflinks/cuffcompare/cuffmerge. Is there a way to analyze the output from cufflinks suite directly in PCAT. It would be significantly improve the usablity of your software. |
![]() |
![]() |
![]() |
#6 |
Member
Location: boston Join Date: May 2012
Posts: 29
|
![]()
Your bed format is the standard bed format.
Now, my only question will be how to combine cufflinks suite with your software to identify noval coding and non-coding transcripts. It seems to be that, I still have to code to change the output of cufflinks to the format required in CPAT. It seems not that difficult though. |
![]() |
![]() |
![]() |
#7 |
Member
Location: boston Join Date: May 2012
Posts: 29
|
![]()
There is a tool that can convert the cufflinks output gtf to the input of cpat bed format.
https://lists.soe.ucsc.edu/pipermail...il/025696.html If you can integrate it into your software, it will be much user friendly, especially for beginners like me. |
![]() |
![]() |
![]() |
Tags |
non-coding, phylocsf, phylogeny |
Thread Tools | |
|
|