SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
sed,head and tail dont work? very interesting. hanifk Bioinformatics 1 10-10-2011 07:58 AM
Problem with Cufflinks output admiral RNA Sequencing 0 06-01-2011 01:19 PM
HTSeq output not correlated with Cufflinks output... Help gen2prot Bioinformatics 5 01-31-2011 09:16 AM
cufflinks cuffcompare output Mark Bioinformatics 1 07-19-2010 07:23 AM

Reply
 
Thread Tools
Old 06-25-2010, 12:59 PM   #1
jetspeeder
Member
 
Location: U.S.

Join Date: Jun 2010
Posts: 12
Default Interesting cufflinks output

how does cufflinks work when you supply it with a gtf file. I used the knownGenes UCSC gtf for mm9. When I run Cufflinks w/o a gtf and then run Cuffcompare with a gtf I get the results below for chromosome M.

cufflinks w/o gtf file
Code:
CUFF.213872.1   859088  chrM    121     953     873.319 1       1       862.495 884.143 1627.3  832
CUFF.213874.1   859088  chrM    1148    2623    746.907 1       1       739.389 754.425 1391.83 1475
CUFF.213876.1   859088  chrM    2726    3681    1593.64 1       1       1580    1607.29 2969.6  955
CUFF.213878.1   859088  chrM    3934    4923    884.824 1       1       874.831 894.817 1648.73 989
CUFF.213880.1   859088  chrM    15416   16059   260.697 1       1       253.97  267.424 485.81  643
then running cuffcompare w/ gtf file on the above file shows

Code:
uc009vev.1      uc009vev.1      =       CUFF.213872     CUFF.213872.1   100     873.319096      862.495228      884.142963      1627.300787     832     CUFF.213872.1
uc009vew.1      uc009vew.1      c       CUFF.213876     CUFF.213876.1   100     1593.642682     1579.995234     1607.290129     2969.596078     955     CUFF.213876.1
uc009vew.1      uc009vew.1      =       CUFF.213874     CUFF.213874.1   100     746.907244      739.389371      754.425116      1391.830367     1475    CUFF.213874.1
uc009vex.1      uc009vex.1      =       CUFF.213878     CUFF.213878.1   100     884.824253      874.831434      894.817071      1648.729185     989     CUFF.213878.1
uc009vfc.1      uc009vfc.1      p       CUFF.213880     CUFF.213880.1   100     260.696876      253.969897      267.423855      485.810439      643     CUFF.213880.1
When I run Cufflinks with a gtf I get this

cufflinks w/ gtf file
Code:
uc009vev.1      602582  chrM    69      852     840.341 1       1       829.397 851.286 1565.84 783
uc009vew.1      602582  chrM    1148    3703    1018.15 1       1       1011.48 1024.82 1897.25 2555
uc009vex.1      602582  chrM    3848    4933    794.654 1       1       785.613 803.696 1480.72 1085
uc009vey.1      602582  chrM    5326    6938    4153.36 1       1       4136.4  4170.32 7739.4  1612
uc009vez.1      602582  chrM    7009    7699    1493.21 1       1       1477.67 1508.76 2782.55 690
uc009vfa.1      602582  chrM    7765    8607    1599.36 1       1       1584.8  1613.92 2980.3  842
uc009vfb.1      602582  chrM    9875    11542   1003.56 1       1       995.359 1011.75 1870.08 1667
uc009vfc.1      602582  chrM    12405   15288   1279.48 1       1       1272.44 1286.52 2384.19 2883
How does Cufflinks utilize the gtf file? The Cufflinks w/o the gtf then Cuffcompare w/ gtf shows 4 unique trans_id for chrM, the Cufflinks w/ gtf shows 8 unique trans_id; in addition, the Cufflinks w/ gtf shows coordinates that are not present in the transcripts.expr of the Cufflinks ran w/o gtf, where did they come from? Thanks.

Either one of the output is generating extra information or one of them is missing information, and I believe the latter to be true because the coverage.wig file from UCSC shows that the genes such as ATPase6 is present so how come it is not showing up in Cufflinks that is ran w/o the gtf file? Any insight would be helpful.

http://genome.ucsc.edu/cgi-bin/hgTra...varRep_close=0

Last edited by jetspeeder; 06-27-2010 at 01:22 PM.
jetspeeder is offline   Reply With Quote
Old 06-27-2010, 01:24 PM   #2
jetspeeder
Member
 
Location: U.S.

Join Date: Jun 2010
Posts: 12
Default

Anyone have any ideas? we are stuck and can't really do anything until we figure this out. Any help would be greatly appreciated.
jetspeeder is offline   Reply With Quote
Old 06-28-2010, 02:51 PM   #3
thinkRNA
Member
 
Location: Carlsbad,CA

Join Date: Jan 2010
Posts: 94
Default

When you run cufflinks with the gtf file, it will only use the isoforms coordinate provided by your GTF file to estimate gene expression. It will not look for any novel isoforms that may stretch beyond the annotations you provide. As you know, existing annotations are incomplete, so this is usually a good option. I think in general you should see more assembled isoforms when not providing gtf file.
What you note is counter-intuitive, i.e cufflinks without a GTF file fails to find isoforms that are known to be present. When you say ATPase6 is "present", what do you mean by this and which uc009vey.X is ATPase6? One thing that I can think of is that when you run cufflinks w/o GTF, this gene has such few alignments that it gets filtered out because of low abundance, look at -F option.
thinkRNA is offline   Reply With Quote
Old 06-28-2010, 05:16 PM   #4
jetspeeder
Member
 
Location: U.S.

Join Date: Jun 2010
Posts: 12
Default

Quote:
Originally Posted by thinkRNA View Post
When you run cufflinks with the gtf file, it will only use the isoforms coordinate provided by your GTF file to estimate gene expression. It will not look for any novel isoforms that may stretch beyond the annotations you provide. As you know, existing annotations are incomplete, so this is usually a good option. I think in general you should see more assembled isoforms when not providing gtf file.
What you note is counter-intuitive, i.e cufflinks without a GTF file fails to find isoforms that are known to be present. When you say ATPase6 is "present", what do you mean by this and which uc009vey.X is ATPase6? One thing that I can think of is that when you run cufflinks w/o GTF, this gene has such few alignments that it gets filtered out because of low abundance, look at -F option.
hey thinkRNA, thanks for the reply. When I say present I am referring to the output of the tophat file, the coverage.wig file, it shows that chromosome M has good coverage in all the areas that contains genes (I just realized the link I posted is no longer showing the input I was showing before, I will generate it later). ATPase6 is uc009vfa.1. The last statement you suggested is definitely a possibility, but when Cufflink is run with a GTF, ATPase6, along with the other missing ones has some of the highest RPKM values (top 5%) which makes it seem unlikely that it is filtered out since it should be filtered out too if one gives it a GTF file?
jetspeeder is offline   Reply With Quote
Old 06-29-2010, 03:29 PM   #5
thinkRNA
Member
 
Location: Carlsbad,CA

Join Date: Jan 2010
Posts: 94
Default

Quote:
Originally Posted by jetspeeder View Post
hey thinkRNA, thanks for the reply. When I say present I am referring to the output of the tophat file, the coverage.wig file, it shows that chromosome M has good coverage in all the areas that contains genes (I just realized the link I posted is no longer showing the input I was showing before, I will generate it later). ATPase6 is uc009vfa.1. The last statement you suggested is definitely a possibility, but when Cufflink is run with a GTF, ATPase6, along with the other missing ones has some of the highest RPKM values (top 5%) which makes it seem unlikely that it is filtered out since it should be filtered out too if one gives it a GTF file?
yes, I just thought about this-run both with GTF and without GTF along with -F =0 and see if you can recover the isoforms.
thinkRNA is offline   Reply With Quote
Old 06-30-2010, 09:24 AM   #6
jetspeeder
Member
 
Location: U.S.

Join Date: Jun 2010
Posts: 12
Default

Quote:
Originally Posted by thinkRNA View Post
yes, I just thought about this-run both with GTF and without GTF along with -F =0 and see if you can recover the isoforms.
hey thinkRNA, I am currently running it with your suggestion, hopefully it will give us more insight. Also here's the picture of the coverage file from tophat
Attached Images
File Type: jpg Picture 2.jpg (21.5 KB, 47 views)
jetspeeder is offline   Reply With Quote
Old 06-30-2010, 10:03 AM   #7
hrajasim
Member
 
Location: DC Metro

Join Date: Aug 2009
Posts: 27
Default

I just completed my first runs of cufflinks, cuffcompare and cuffdiff.
Now I am trying to make sense of the several resulting files.
When I look at the cuffdiff output for genes and transcripts, I see only cufflinks IDs but not reference gene symbol.
Is there a way to map the cufflinks ID for genes and transcripts to corresponding gene symbol?
hrajasim is offline   Reply With Quote
Old 07-01-2010, 09:12 AM   #8
jetspeeder
Member
 
Location: U.S.

Join Date: Jun 2010
Posts: 12
Default

hey krajasim, i haven't ran cuffdiff yet so i am not sure, i m assuming if you put a reference file when you run cuffdiff it should have the reference symbol next to it.
jetspeeder is offline   Reply With Quote
Old 12-21-2010, 06:51 AM   #9
honey
Senior Member
 
Location: Pittsburgh

Join Date: Feb 2010
Posts: 151
Default

I have related question in the Cuffdiff out put file there are Cuffdiff ids How can I either match or replace them with standard Ref seq id ir Ensemble IDs.
Please advice?
honey is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:36 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO