Hi!
I am an user of cufflinks. It's quite a good tool to estimate gene differential expression. However, our purpose is not only to test differentially expressed genes, but also to get these genes' sequences,which are assembled by cufflinks. We have used option -g in cufflinks and -r in cuffcompare.I try to fetch all the assembled transcripts by reading the locus of each assembled transcripts' exon from combined.gtf file produced by cuffcompare,then using samtools mpileup to parse whether a locus has a different base
in mapped reads comparing to reference base of this locus, and finally joining these exons together to get full length sequences of assembled transcripts with IDs like "TCON...". However, I find that in combined.gtf,some exons' locus doesn't have any reads mapped at all, and if I continue the method I described before, I just wonder if I should use reference base in this locus, because this locus is actually not detected by real reads.
I have read the manual of cufflinks in webset. I think the point is that in RABT model, cufflinks uses faux-reads to help assembling transcripts. It is quite helpful to detect new isoforms, However, when you try to get the real sequences of these isoforms, things became confusing. Do you have any suggestions to separate the real sequenced transcripted assembled by real reads from those reference trancripts assembled by faux-reads? In another word, how can I get the sequnces of each assembled transcripts from cufflinks?
Thanks for help!
asling
I am an user of cufflinks. It's quite a good tool to estimate gene differential expression. However, our purpose is not only to test differentially expressed genes, but also to get these genes' sequences,which are assembled by cufflinks. We have used option -g in cufflinks and -r in cuffcompare.I try to fetch all the assembled transcripts by reading the locus of each assembled transcripts' exon from combined.gtf file produced by cuffcompare,then using samtools mpileup to parse whether a locus has a different base
in mapped reads comparing to reference base of this locus, and finally joining these exons together to get full length sequences of assembled transcripts with IDs like "TCON...". However, I find that in combined.gtf,some exons' locus doesn't have any reads mapped at all, and if I continue the method I described before, I just wonder if I should use reference base in this locus, because this locus is actually not detected by real reads.
I have read the manual of cufflinks in webset. I think the point is that in RABT model, cufflinks uses faux-reads to help assembling transcripts. It is quite helpful to detect new isoforms, However, when you try to get the real sequences of these isoforms, things became confusing. Do you have any suggestions to separate the real sequenced transcripted assembled by real reads from those reference trancripts assembled by faux-reads? In another word, how can I get the sequnces of each assembled transcripts from cufflinks?
Thanks for help!
asling
Comment