Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interesting cufflinks output

    how does cufflinks work when you supply it with a gtf file. I used the knownGenes UCSC gtf for mm9. When I run Cufflinks w/o a gtf and then run Cuffcompare with a gtf I get the results below for chromosome M.

    cufflinks w/o gtf file
    Code:
    CUFF.213872.1   859088  chrM    121     953     873.319 1       1       862.495 884.143 1627.3  832
    CUFF.213874.1   859088  chrM    1148    2623    746.907 1       1       739.389 754.425 1391.83 1475
    CUFF.213876.1   859088  chrM    2726    3681    1593.64 1       1       1580    1607.29 2969.6  955
    CUFF.213878.1   859088  chrM    3934    4923    884.824 1       1       874.831 894.817 1648.73 989
    CUFF.213880.1   859088  chrM    15416   16059   260.697 1       1       253.97  267.424 485.81  643
    then running cuffcompare w/ gtf file on the above file shows

    Code:
    uc009vev.1      uc009vev.1      =       CUFF.213872     CUFF.213872.1   100     873.319096      862.495228      884.142963      1627.300787     832     CUFF.213872.1
    uc009vew.1      uc009vew.1      c       CUFF.213876     CUFF.213876.1   100     1593.642682     1579.995234     1607.290129     2969.596078     955     CUFF.213876.1
    uc009vew.1      uc009vew.1      =       CUFF.213874     CUFF.213874.1   100     746.907244      739.389371      754.425116      1391.830367     1475    CUFF.213874.1
    uc009vex.1      uc009vex.1      =       CUFF.213878     CUFF.213878.1   100     884.824253      874.831434      894.817071      1648.729185     989     CUFF.213878.1
    uc009vfc.1      uc009vfc.1      p       CUFF.213880     CUFF.213880.1   100     260.696876      253.969897      267.423855      485.810439      643     CUFF.213880.1
    When I run Cufflinks with a gtf I get this

    cufflinks w/ gtf file
    Code:
    uc009vev.1      602582  chrM    69      852     840.341 1       1       829.397 851.286 1565.84 783
    uc009vew.1      602582  chrM    1148    3703    1018.15 1       1       1011.48 1024.82 1897.25 2555
    uc009vex.1      602582  chrM    3848    4933    794.654 1       1       785.613 803.696 1480.72 1085
    uc009vey.1      602582  chrM    5326    6938    4153.36 1       1       4136.4  4170.32 7739.4  1612
    uc009vez.1      602582  chrM    7009    7699    1493.21 1       1       1477.67 1508.76 2782.55 690
    uc009vfa.1      602582  chrM    7765    8607    1599.36 1       1       1584.8  1613.92 2980.3  842
    uc009vfb.1      602582  chrM    9875    11542   1003.56 1       1       995.359 1011.75 1870.08 1667
    uc009vfc.1      602582  chrM    12405   15288   1279.48 1       1       1272.44 1286.52 2384.19 2883
    How does Cufflinks utilize the gtf file? The Cufflinks w/o the gtf then Cuffcompare w/ gtf shows 4 unique trans_id for chrM, the Cufflinks w/ gtf shows 8 unique trans_id; in addition, the Cufflinks w/ gtf shows coordinates that are not present in the transcripts.expr of the Cufflinks ran w/o gtf, where did they come from? Thanks.

    Either one of the output is generating extra information or one of them is missing information, and I believe the latter to be true because the coverage.wig file from UCSC shows that the genes such as ATPase6 is present so how come it is not showing up in Cufflinks that is ran w/o the gtf file? Any insight would be helpful.

    http://genome.ucsc.edu/cgi-bin/hgTra...varRep_close=0
    Last edited by jetspeeder; 06-27-2010, 01:22 PM.

  • #2
    Anyone have any ideas? we are stuck and can't really do anything until we figure this out. Any help would be greatly appreciated.

    Comment


    • #3
      When you run cufflinks with the gtf file, it will only use the isoforms coordinate provided by your GTF file to estimate gene expression. It will not look for any novel isoforms that may stretch beyond the annotations you provide. As you know, existing annotations are incomplete, so this is usually a good option. I think in general you should see more assembled isoforms when not providing gtf file.
      What you note is counter-intuitive, i.e cufflinks without a GTF file fails to find isoforms that are known to be present. When you say ATPase6 is "present", what do you mean by this and which uc009vey.X is ATPase6? One thing that I can think of is that when you run cufflinks w/o GTF, this gene has such few alignments that it gets filtered out because of low abundance, look at -F option.

      Comment


      • #4
        Originally posted by thinkRNA View Post
        When you run cufflinks with the gtf file, it will only use the isoforms coordinate provided by your GTF file to estimate gene expression. It will not look for any novel isoforms that may stretch beyond the annotations you provide. As you know, existing annotations are incomplete, so this is usually a good option. I think in general you should see more assembled isoforms when not providing gtf file.
        What you note is counter-intuitive, i.e cufflinks without a GTF file fails to find isoforms that are known to be present. When you say ATPase6 is "present", what do you mean by this and which uc009vey.X is ATPase6? One thing that I can think of is that when you run cufflinks w/o GTF, this gene has such few alignments that it gets filtered out because of low abundance, look at -F option.
        hey thinkRNA, thanks for the reply. When I say present I am referring to the output of the tophat file, the coverage.wig file, it shows that chromosome M has good coverage in all the areas that contains genes (I just realized the link I posted is no longer showing the input I was showing before, I will generate it later). ATPase6 is uc009vfa.1. The last statement you suggested is definitely a possibility, but when Cufflink is run with a GTF, ATPase6, along with the other missing ones has some of the highest RPKM values (top 5%) which makes it seem unlikely that it is filtered out since it should be filtered out too if one gives it a GTF file?

        Comment


        • #5
          Originally posted by jetspeeder View Post
          hey thinkRNA, thanks for the reply. When I say present I am referring to the output of the tophat file, the coverage.wig file, it shows that chromosome M has good coverage in all the areas that contains genes (I just realized the link I posted is no longer showing the input I was showing before, I will generate it later). ATPase6 is uc009vfa.1. The last statement you suggested is definitely a possibility, but when Cufflink is run with a GTF, ATPase6, along with the other missing ones has some of the highest RPKM values (top 5%) which makes it seem unlikely that it is filtered out since it should be filtered out too if one gives it a GTF file?
          yes, I just thought about this-run both with GTF and without GTF along with -F =0 and see if you can recover the isoforms.

          Comment


          • #6
            Originally posted by thinkRNA View Post
            yes, I just thought about this-run both with GTF and without GTF along with -F =0 and see if you can recover the isoforms.
            hey thinkRNA, I am currently running it with your suggestion, hopefully it will give us more insight. Also here's the picture of the coverage file from tophat
            Attached Files

            Comment


            • #7
              I just completed my first runs of cufflinks, cuffcompare and cuffdiff.
              Now I am trying to make sense of the several resulting files.
              When I look at the cuffdiff output for genes and transcripts, I see only cufflinks IDs but not reference gene symbol.
              Is there a way to map the cufflinks ID for genes and transcripts to corresponding gene symbol?

              Comment


              • #8
                hey krajasim, i haven't ran cuffdiff yet so i am not sure, i m assuming if you put a reference file when you run cuffdiff it should have the reference symbol next to it.

                Comment


                • #9
                  I have related question in the Cuffdiff out put file there are Cuffdiff ids How can I either match or replace them with standard Ref seq id ir Ensemble IDs.
                  Please advice?

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM
                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  18 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  22 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  17 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  48 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X