Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • CuffDiff output

    Hi all,
    I used Cufflinks in the following work-flow:
    CuffLinks -> CuffCompare -> CuffDiff

    The output file genes.fpkm_tracking didn't include reference genes at all:

    Code:
    tracking_id     class_code      nearest_ref_id  gene_short_name tss_id  locus   MM_FPKM MM_conf_lo      MM_conf_hi      LOG_FPKM        LOG_conf_lo     LOG_conf_hi     SFT_FPKM        SFT_conf_lo     SFT_conf_hi     NY_FPKMNY_conf_lo       NY_conf_hi
    XLOC_000001     -       -       -       -       SL2.30ch00:551338-551631        1.66555 0       4.24667 0.446456        0       1.7828  0.841447        0       2.67606 0       0       0
    XLOC_000002     -       -       -       -       SL2.30ch00:4196781-4198207      122.746 100.586 144.907 185.302 158.075 212.529 121.462 99.1515 143.773 1616.46 1469.49 1763.43
    Even though the combined.gtf that was created in CuffCompare did contain a lot overlaps with the known genes. Also the isoforms.fpkm_tracking output file DID contain reference annotations, but in the level of exons:
    Code:
    tracking_id     class_code      nearest_ref_id  gene_short_name tss_id  locus   MM_FPKM MM_conf_lo      MM_conf_hi      LOG_FPKM        LOG_conf_lo     LOG_conf_hi     SFT_FPKM        SFT_conf_lo     SFT_conf_hi     NY_FPKMNY_conf_lo       NY_conf_hi
    TCONS_00000001  =       exon:Solyc00g005040.1.1.3       -       -       SL2.30ch00:551338-551631        1.66555 0       4.24667 0.446456        0       1.7828  0.841447        0       2.67606 0       0       0
    TCONS_00000002  o       exon:Solyc00g006470.1.1.4       -       -       SL2.30ch00:4196781-4198207      62.9947 47.1187 78.8707 95.0768 75.573  114.581 52.5381 37.9501 67.126  972.538 908.856 1036.22
    * Of course, when I only ran CuffDiff with the reference GTF - I got gene expression levels with the known genes.

    My questions is:
    Is there a way to get gene (and not exon) expression levels AND novel transcripts using Cufflinks?
    And why in the genes.fpkm_tracking file I don't get the closest reference annotation to that gene?

    Thanks!
    Rachelly.

  • #2
    gene level

    For gene level run TopHat with Ensembl/ refflat GTF file

    Comment


    • #3
      Cole's answer

      I consulted Cole on this matter and this was his reply:

      Actually, you won't see those id's in the genes.fpkm_tracking (or, IIRC, the tss_group.fpkm_tracking) files, because as far as Cufflinks is concerned, genes and tss groups are *sets* of transcripts. Each transcript in a gene could have a different nearest reference transcript, so we don't put anything in that field.
      However, the way we recommend doing what (I think) you want here is to use the gene_name attribute. If you compare to a reference file that has gene_name attributes, they will get propogated to the stdout.combined.gtf file from cuffcompare. Ensembl has the gene_name attributes already built in (and the values are typically the HUGO names in the case of human), but you could add them to your reference if they're not there already.

      Comment


      • #4
        Originally posted by Rachelly View Post
        I consulted Cole on this matter and this was his reply:
        Hi Rachelly, I seem to having the same problem. My Cuffdiff output does not contain gene names. Could you post an example of a reference file that worked and the commands you ran that worked? I tried rerunning cuffcompare with ensembl which contained gene_name attributes but that did not seem to work. The output of my ensembl annotation file:

        11 pseudogene exon 86649 87586 . - . gene_id "ENSG00000224777"; transcript_id "ENST00000424047"; exon_number "1"; gene_name "OR4F2P"; transcript_name "OR4F2P-001";
        11 protein_coding exon 129060 129388 . - . gene_id "ENSG00000230724"; transcript_id "ENST00000382784"; exon_number "1"; gene_name "AC069287.3"; transcript_name "AC069287.3-201";

        Comment


        • #5
          Cuffcompare

          If you ran Cuffcompare with a reference file you can extract the significant Cuffdiff transcript piles and grep out those lines in your combined gtf file which should contain your gene ids. This will tell you which genes are significant.

          Requires unix commands cut, awk, grep, | (pipe) and xargs -I

          Comment


          • #6
            I found that I had to use the -s switch in cuffcompare in order for it to propagate my gene names (with gene_name attribute in last column of GTF) all the way through to the final cuffdiff files.

            Comment


            • #7
              is genes.gtf the correct annotation file?

              Hi all,
              I had the same problem, but figured that I had to run tophat with the Ensmble "genes.gtf" file, which is what I did.
              All works fine, untill I want to run Cuffmerge:
              There I'm getting the following error:

              Error: duplicate GFF ID 'ENSMUST00000098282' encountered!
              [FAILED]

              In another set I was running, I get the same error with a different ENSMUST number.
              Any clue on what's wrong here? Obviously there's multiple lies with that ID, but why did it go allright with Tophat then????

              Thanks!
              K.

              Comment


              • #8
                Ok, I found the issue. Turns out I was being too "efficient"

                I am comparing 2 times 2 datasets, and I was already running the cuffmerge on the second set while the run on the first dataset was still ongoing (wanted to be fast...).
                However, I forgot to change the directory name, so both runs saved to the same dir... and ran into problems.
                It was all solved when I assigned them different directories...

                Karel

                Comment


                • #9
                  Sorry, I know this is a basic question comparatively, but can someone give me a quick take on the gene ID's. I ran cuffdiff to get the significantly differentially expressed genes. I want to view them in DAVID or Ensembl to check out the actual pathways. I saved all of my 300 or so genes in a txt file with many genes having more than 1 unique ID (e.g. B1AKN3,NP_001036147,Q9P2R6,uc001aph.1) and uploaded to DAVID. However, it could only "ambiguously" match 25 of these genes. What kind of gene IDs are these? There are appear to be more than one kind. How do you view your pathways???

                  Comment


                  • #10
                    bump

                    Sorry, I'm just having trouble working with these gene names. Some are UniProt, some are RefSeq, some are UCSC. How do you guys do it? DAVID has no idea what I'm uploading? What do you guys use? And does it recognize all the gene names?

                    Comment


                    • #11
                      Please help...

                      I'm sorry, I'm just so confused on this. Why are there more than one genes listed for promoters.diff, or tss_group.diff, or even gene_exp.diff??? I just don't get it. It says right there in the Cufflinks manual, and I'm quoting:

                      "Transcripts with the same gene_id are part of the same gene group, and similarly, those with the same tss_id and p_id are part of the same primary transcript group and CDS group. "

                      How can one transcription start site be associated with more than one gene?? Likewise with promoters and CDS?

                      Sincere thanks to anyone that can help me with this!
                      Last edited by billstevens; 04-15-2012, 01:12 PM.

                      Comment


                      • #12
                        Hey guys,

                        So I have this plan for analyzing my data using DAVID, and I was hoping maybe someone might say how they do their differential expression gene analysis. From the output of gene_expression.diff file, I take the significant genes and then I remove all of the subsets of genes (e.g. if uc0012w.1, i make it uc0012w) and then I load this into DAVID. I got rid of the subsets because oftentimes DAVID couldn't find the subset, but DAVID did recognize it without the subset, and I imagine they would both have the same gene. I found that DAVID recognizes all genes that have been reviewed. This seems like a nice and straightforward method for obtaining my network.

                        Am I totally off-base? Anyone?

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM
                        • seqadmin
                          Techniques and Challenges in Conservation Genomics
                          by seqadmin



                          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                          Avian Conservation
                          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                          03-08-2024, 10:41 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, Yesterday, 06:37 PM
                        0 responses
                        12 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, Yesterday, 06:07 PM
                        0 responses
                        10 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-22-2024, 10:03 AM
                        0 responses
                        52 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-21-2024, 07:32 AM
                        0 responses
                        68 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X