Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Kittykat22,

    e.g.
    Code:
    library(cummeRbund)
    cuff<-readCufflinks()
    
    allIsoformFPKMs<-fpkm(isoforms(cuff))
    allIsoformFeatures<-features(isoforms(cuff))
    
    sigIsoforms<-getSig(cuff,level='isoforms')
    
    sigIsoformFeatures<-allIsoformFeatures[allIsoformFeatures$isoform_id %in% sigIsoforms,]
    sigIsoformFPKMs<-allIsoformFPKMs[allIsoformFPKMs$isoform_id %in% sigIsoforms,]
    Cheers,
    Loyal

    Comment


    • #17
      Hi Loyal,

      thanks so much for your help and your quick response! I can get the gene identifier now and even though I havent figured out how to include the gene_short_name, Im sure Ill get there!

      K

      Comment


      • #18
        Hi All,

        I know that Cuffdiff does pairwise differential expression testing, however I have more than two groups A, B ... F. I am not interested in all possible group pairings, just: A vs. B, C vs. D and E vs. F.

        Follow the protocol published in nature my code would be:

        >gene.diff.data <-diffData(genes(cuff_data), 'A','B')
        >sig.gene.data <-subset(gene.diff.data, (significant=='yes'))
        >nrow(sig.gene.data)
        >write.table(sig.gene.data, 'diff_genes_AvsB.txt', sep= '\t', row.names = F, col.names =T, quote = F)

        However, I dont get the "gene_short_name" in my final table.

        Then follow the discussion in this post, I'd start with:

        >diffGeneIDs <- getSig(cuff,level='genes','A','B',alpha=0.05)

        I followed the code posted here but after:

        >row.names(diffGenesData)=diffGenesData$gene_id

        I get:

        Error in `row.names<-.data.frame`(`*tmp*`, value = c("XLOC_000028", "XLOC_000028", : duplicate 'row.names' are not allowed
        In addition: Warning message:non-unique values when setting 'row.names': ‘XLOC_... [... truncated]
        any idea what I'm doing wrong.

        Thanks

        Comment


        • #19
          Originally posted by mondongho View Post
          Hi All,

          I know that Cuffdiff does pairwise differential expression testing, however I have more than two groups A, B ... F. I am not interested in all possible group pairings, just: A vs. B, C vs. D and E vs. F.

          Follow the protocol published in nature my code would be:

          >gene.diff.data <-diffData(genes(cuff_data), 'A','B')
          >sig.gene.data <-subset(gene.diff.data, (significant=='yes'))
          >nrow(sig.gene.data)
          >write.table(sig.gene.data, 'diff_genes_AvsB.txt', sep= '\t', row.names = F, col.names =T, quote = F)

          However, I dont get the "gene_short_name" in my final table.

          Then follow the discussion in this post, I'd start with:

          >diffGeneIDs <- getSig(cuff,level='genes','A','B',alpha=0.05)

          I followed the code posted here but after:

          >row.names(diffGenesData)=diffGenesData$gene_id

          I get:



          any idea what I'm doing wrong.

          Thanks
          I got the exact same error, and I also have no idea what I'm doing wrong. My goal is to switch out the gene identifier ID with the gene short name in my write table.

          Assuming, I can get this fixed and merge the data using Thomas' code, would my next step be:

          write.table(diffGenesOutput, 'name.txt', sep= '\t', row.names = F, col.names =T, quote = F)

          With diffGenesOutput being the merged file that was created with Thomas' code.

          I am also very clearly a newbie

          Comment


          • #20
            OK, so the error was because of the duplicate IDs, so I guess I may need a unique command or something.

            But I just went around this whole thing the long way, and from Loyal's suggestion, I used featureNames. So now I have two tables, one that has all the information like the Xloc ID, fold change, Pvalue, etc., and other that just has the Xloc ID and the tracking ID. So I can merge these together in access, but what are these tracking IDs? How do I go from these to the names of genes?

            Comment


            • #21
              I am in a similar fix and I am a cummeRbund newbie too.

              I am running R-2.15.2 and the BioConductor loaded cummeRbund so as of the first of the month I am mostly up-to-date.

              What I want to do is to make a series of graphs from my significant isoform and splice sets.

              getGene does not retrieve anything when given an isoform ID or splicing ID. I want to find the gene id corresponding to a given
              isoform id or splice id

              Per the Loyal's suggestion of the April 1 I can examine an isoform feature table eg:

              sigIsoformFeatures
              isoform_id gene_id CDS_id gene_short_name TSS_group_id class_code
              657 TCONS_00000661 XLOC_000361 NA Syt5 TSS449 j
              1651 TCONS_00001656 XLOC_000935 NA Grm5 TSS1161 =

              Here's part of my sig isoform set:

              head(mysigIsoforms,length(mysigIsoforms))
              [1] "TCONS_00000661" "TCONS_00001656" "TCONS_00001659" "TCONS_00001672"

              what's the syntax for getting the gene_id that matches the isoform id? For "TCONS_00000661" I want to select one of these "XLOC_000361" or "TSS449" or "Syt5"

              The following loop work for plotting a set significant genes. It selects the ith geneID gets the gene and makes various plots. I'd like to use this if I can generalize it to examine the feature table.


              x<-length(Sham_vs_Expt.sigIsoforms)
              x
              print(x)


              for(i in 1:x) {
              print(Sham_vs_Expt.sigIsoforms[i])
              myGeneID<-Sham_vs_Expt.sigIsoforms[i]
              myGeneID
              myGene<-getGene(cuff,myGeneID)
              print(myGene)
              head(fpkm(myGene))
              head(fpkm(isoforms(myGene)))
              gl<-expressionPlot(myGene)
              print(gl)
              gb<-expressionBarplot(myGene)
              print(gb)
              igb<-expressionBarplot(isoforms(myGene))
              print(igb)

              }

              Comment


              • #22
                So, I figured out the issues of getting isoforms data plotted like I want but I now see and issue with duplicate entries AND have a general question

                Why do the plotting programs fail when told to plot values for an entry from the selected SIGNIFICANT set when that set has zero for instance size: see TCONS_0000855 example below. Why not just complain and move on?

                One of my entries from the getSig operation was TCONS_0000855 . When I attempt to plot the results for this entry the plotting output fails to produce anything.

                [1] "TCONS_00008558"
                [1] "Bcas3"
                CuffGene instance for gene Bcas3
                Short name:
                Slots:
                annotation
                fpkm
                diff
                isoforms CuffFeature instance of size 0
                TSS CuffFeature instance of size 0
                CDS CuffFeature instance of size 0

                for which entry I see the following entries in the isoforms tracking file:

                cat isoforms.fpkm_tracking | grep Bcas3
                TCONS_00008547 j ENSRNOT00000004657 XLOC_004745 Bcas3 TSS5831 10:73617528-74078192 3507 - 7.71818 4.06972 11.3666 OK 2.42156 0.712968 4.13014 OK
                TCONS_00008548 j ENSRNOT00000004657 XLOC_004745 Bcas3 TSS5831 10:73617528-74078192 3643 - 2.80025 0.856255 4.74425 OK 2.88671 1.06115 4.71227 OK
                TCONS_00008549 j ENSRNOT00000004657 XLOC_004745 Bcas3 TSS5831 10:73617528-74078192 3612 - 0 0 0 OK 1.24097 0.0528171 2.42912 OK
                TCONS_00008550 j ENSRNOT00000004657 XLOC_004745 Bcas3 TSS5831 10:73617528-74078192 3633 - 0 0 0 OK 0 0 0 OK
                TCONS_00008551 = ENSRNOT00000004615 XLOC_004745 Bcas3 TSS5831 10:73617528-74078192 1597 - 1.89857 0 4.33187 LOWDATA 2.22392 0 4.53372 LOWDATA
                TCONS_00008552 = ENSRNOT00000055837 XLOC_004745 Bcas3 TSS5831 10:73617528-74078192 2542 - 0.366576 0.366576 0.366576 LOWDATA 0 0 0 LOWDATA
                TCONS_00008553 = ENSRNOT00000004657 XLOC_004745 Bcas3 TSS5831 10:73617528-74078192 2605 - 1.63596 0 3.33679 LOWDATA 1.07684 0 2.37032 LOWDATA
                TCONS_00008554 j ENSRNOT00000004657 XLOC_004745 Bcas3 TSS5831 10:73617528-74078192 3774 - 0.954419 0 2.05837 OK 0 0 0 OK
                TCONS_00008555 j ENSRNOT00000004657 XLOC_004745 Bcas3 TSS5831 10:73617528-74078192 3651 - 0 0 0 OK 0 0 0 OK
                TCONS_00008556 j ENSRNOT00000004657 XLOC_004745 Bcas3 TSS5831 10:73617528-74078192 3568 - 0.000420794 0 0.140056 OK 0 0 0 OK
                TCONS_00008557 j ENSRNOT00000004657 XLOC_004745 Bcas3 TSS5831 10:73617528-74078192 3724 - 1.24194 0 2.54516 OK 1.10249 0 2.21521 OK
                TCONS_00008558 j ENSRNOT00000004657 XLOC_004745 Bcas3 TSS5831 10:73617528-74078192 3522 - 0 0 0 OK 5.85637 3.16317 8.54957 OK

                Comment


                • #23
                  Hi Starr,
                  Can you post the commands that you are using that fail to plot? Or if there is another error message that is being produced? Sorry that this is causing so much consternation. I'm working on a 'getFeatures' method (similar to getGenes) that should help resolve a bunch of this.

                  Cheers,
                  Loyal

                  Comment


                  • #24
                    So now I can get gene_short_names from significant TSS ids-- I just cannot plot them

                    The plot commands I am using are straight out of the cummeRbund manual except that I put them into a for-next loop. Please note that this works fine on getSig output set derived from significant genes set but hangs on isoform and splicing sets.

                    I examine the ith entry in the set then getGene on that id and make several plots.

                    I then have a PDF with content or not. The PDF has no messages.

                    I pipe the output to a text file but that just tells me how far I have gotten.Here are the last lines of output:

                    > for(i in 1:x) {
                    + print(mySigSplicing[i])
                    + print(mySplicingGeneShortName[i])
                    + print(mySplicingIDs[i])
                    + myGeneID<-mySplicingGeneShortName[i]
                    + #myGeneID<-mySplicingIDs[i]
                    + print(myGeneID)
                    + myGene<-getGene(cuff,myGeneID)
                    + print(myGene)
                    + head(fpkm(myGene))
                    + head(fpkm(splicing(myGene)))
                    + gl<-expressionPlot(myGene)
                    + print(gl)
                    + gb<-expressionBarplot(myGene)
                    + print(gb)
                    + igb<-expressionBarplot(splicing(myGene))
                    + print(igb)
                    +
                    + }
                    [1] "TSS10291"
                    [1] "Tmem6"
                    NULL
                    [1] "Tmem6"
                    CuffGene instance for gene Tmem6
                    Short name: Tmem6
                    Slots:
                    annotation
                    fpkm
                    diff
                    isoforms CuffFeature instance of size 2
                    TSS CuffFeature instance of size 1
                    CDS CuffFeature instance of size 0



                    There is a screen echo that says:

                    Error in 'colnames<-'*tmp*,value = "tracking_id"0 :
                    'names/ attribute[1] must be the same length as the vector [0]
                    Calls: expressionBarplot-> expressionBarplot->.local ->colnames<-
                    Execution halted
                    Warning message:
                    RS-DBI driver warning (closing pending result sets before closing this connection)

                    Here is the plotting code again:

                    x<-length(Sham_vs_Expt.sigIsoforms)
                    x
                    print(x)


                    for(i in 1:x) {
                    print(Sham_vs_Expt.sigIsoforms[i])
                    myGeneID<-Sham_vs_Expt.sigIsoforms[i]
                    myGeneID
                    myGene<-getGene(cuff,myGeneID)
                    print(myGene)
                    head(fpkm(myGene))
                    head(fpkm(isoforms(myGene)))
                    gl<-expressionPlot(myGene)
                    print(gl)
                    gb<-expressionBarplot(myGene)
                    print(gb)
                    igb<-expressionBarplot(isoforms(myGene))
                    print(igb)

                    }

                    Comment


                    • #25
                      Originally posted by Starr_Hazard View Post
                      There is a screen echo that says:

                      Error in 'colnames<-'*tmp*,value = "tracking_id"0 :
                      'names/ attribute[1] must be the same length as the vector [0]
                      Calls: expressionBarplot-> expressionBarplot->.local ->colnames<-
                      Execution halted
                      Warning message:
                      RS-DBI driver warning (closing pending result sets before closing this connection)

                      }
                      Hi Starr,
                      Can you confirm that a) this plot for this particular gene works when not inside the loop and b) that the returned CuffGene object is not empty?

                      Right now, getGene() will not die if you give it a gene_id that does not exist in the database, but rather returns an 'empty' CuffGene object that has some less than desirable behavior. I am working on making this more robust but I haven't had the opportunity to yet.

                      If you can confirm that an empty CuffGene object is created, perhaps you can remove those gene_ids that are not in fact in the full list of 'features()'?

                      -Loyal

                      Comment


                      • #26
                        OK so plotting routine fails on the gene AFTER the last one in the output
                        When I run the commands outside of the loop here's what happens

                        > library(cummeRbund)
                        Loading required package: RSQLite
                        Loading required package: DBI
                        Loading required package: ggplot2
                        Loading required package: reshape2
                        > cuff<-readCufflinks()
                        > cuff
                        CuffSet instance with:
                        2 samples
                        31740 genes
                        57676 isoforms
                        38842 TSS
                        15941 CDS
                        31740 promoters
                        38842 splicing
                        13596 relCDS
                        > myGene<-getGene(cuff,"Bcas3")
                        > print(myGene)
                        CuffGene instance for gene Bcas3
                        Short name:
                        Slots:
                        annotation
                        fpkm
                        diff
                        isoforms CuffFeature instance of size 0
                        TSS CuffFeature instance of size 0
                        CDS CuffFeature instance of size 0
                        > head(fpkm(myGene))
                        [1] gene_id sample_name fpkm conf_hi conf_lo
                        [6] quant_status
                        <0 rows> (or 0-length row.names)
                        > head(fpkm(isoforms(myGene)))
                        [1] isoform_id sample_name fpkm conf_hi conf_lo
                        [6] quant_status
                        <0 rows> (or 0-length row.names)
                        > gl<-expressionPlot(myGene)
                        > gl
                        Error in eval(expr, envir, enclos) : object 'sample_name' not found
                        > gb<-expressionBarplot(myGene)
                        > gb
                        Error in data.frame(..., check.names = FALSE) :
                        arguments imply differing number of rows: 1, 0
                        > igb<-expressionBarplot(isoforms(myGene))
                        > igb
                        Error in data.frame(..., check.names = FALSE) :
                        arguments imply differing number of rows: 1, 0
                        >

                        So this begs the question, how come this gene got retrieved as a gene with Significant results? Here's how I getSig
                        Saline_vs_Cocaine.sigIsoforms<-getSig(cuff,x='Saline',y='Cocaine',alpha=0.01,level='isoforms')
                        > head(Saline_vs_Cocaine.sigIsoforms,length(Saline_vs_Cocaine.sigIsoforms))
                        Here's the features retrieval
                        sigIsoformFeatures<-allIsoformFeatures[allIsoformFeatures$isoform_id %in% Saline_vs_Cocaine.sigIsoforms,]

                        Here's getting the short name

                        myGeneShortName<-sigIsoformFeatures$gene_short_name

                        Bcas3 is in the feature list, has a TCONS number, and XLOC number an ENSRN number and I can view the gene in IGV or NCBI MapView or SeqMonk or DNANexus view.

                        So a gene, Bcas3, is in my getSig output set but has no results?

                        Comment


                        • #27
                          Hi Starr_HAzard, how did you create the list of genes you want to work with?
                          I would like to select a certain number of genes out of cuffdiff among the significant ones.

                          I am new to R and linux...any help is appreciated.

                          Best,
                          Ibseq

                          Comment


                          • #28
                            Hi Loyal,
                            I am trying to use cummerbund to analyse a specific set of transcripts within the significant ones...I can do this on Excel on wondows, but as I am not very experience on using R and Unix, this is quite challenging for me.

                            What kind of command do I need to type to select about 20 signficant transcripts from my differentially expressed transcripts?

                            I will really appreciate some help!

                            best,
                            ibseq

                            Comment


                            • #29
                              Hi Starr

                              I am facing problem with cummeRbund, I searched manually RBBP8 in cuffdiff results and is there but cummeRbund can not find it when I try mygene<-getGene(cuff_data,'RBBP8"). Other genes are found but not few only.

                              ????



                              Originally posted by Starr_Hazard View Post
                              OK so plotting routine fails on the gene AFTER the last one in the output
                              When I run the commands outside of the loop here's what happens

                              > library(cummeRbund)
                              Loading required package: RSQLite
                              Loading required package: DBI
                              Loading required package: ggplot2
                              Loading required package: reshape2
                              > cuff<-readCufflinks()
                              > cuff
                              CuffSet instance with:
                              2 samples
                              31740 genes
                              57676 isoforms
                              38842 TSS
                              15941 CDS
                              31740 promoters
                              38842 splicing
                              13596 relCDS
                              > myGene<-getGene(cuff,"Bcas3")
                              > print(myGene)
                              CuffGene instance for gene Bcas3
                              Short name:
                              Slots:
                              annotation
                              fpkm
                              diff
                              isoforms CuffFeature instance of size 0
                              TSS CuffFeature instance of size 0
                              CDS CuffFeature instance of size 0
                              > head(fpkm(myGene))
                              [1] gene_id sample_name fpkm conf_hi conf_lo
                              [6] quant_status
                              <0 rows> (or 0-length row.names)
                              > head(fpkm(isoforms(myGene)))
                              [1] isoform_id sample_name fpkm conf_hi conf_lo
                              [6] quant_status
                              <0 rows> (or 0-length row.names)
                              > gl<-expressionPlot(myGene)
                              > gl
                              Error in eval(expr, envir, enclos) : object 'sample_name' not found
                              > gb<-expressionBarplot(myGene)
                              > gb
                              Error in data.frame(..., check.names = FALSE) :
                              arguments imply differing number of rows: 1, 0
                              > igb<-expressionBarplot(isoforms(myGene))
                              > igb
                              Error in data.frame(..., check.names = FALSE) :
                              arguments imply differing number of rows: 1, 0
                              >

                              So this begs the question, how come this gene got retrieved as a gene with Significant results? Here's how I getSig
                              Saline_vs_Cocaine.sigIsoforms<-getSig(cuff,x='Saline',y='Cocaine',alpha=0.01,level='isoforms')
                              > head(Saline_vs_Cocaine.sigIsoforms,length(Saline_vs_Cocaine.sigIsoforms))
                              Here's the features retrieval
                              sigIsoformFeatures<-allIsoformFeatures[allIsoformFeatures$isoform_id %in% Saline_vs_Cocaine.sigIsoforms,]

                              Here's getting the short name

                              myGeneShortName<-sigIsoformFeatures$gene_short_name

                              Bcas3 is in the feature list, has a TCONS number, and XLOC number an ENSRN number and I can view the gene in IGV or NCBI MapView or SeqMonk or DNANexus view.

                              So a gene, Bcas3, is in my getSig output set but has no results?

                              Comment


                              • #30
                                Similar Problem

                                Hi everyone,

                                I hope someone is still checking this thread. I am having a very similar problem. I have tried all of the troubleshoots posted here. I keep getting the following:
                                tracking_id gene_short_name
                                1 XLOC_000010 <NA>
                                2 XLOC_000011 <NA>
                                3 XLOC_000013 <NA>
                                4 XLOC_000014 <NA>
                                5 XLOC_000018 <NA>
                                6 XLOC_000022 <NA>
                                Where all my gene_short_names are NA with the exception of a handful. When I check the file "genes.fpkm_tracking" in my cuffdiff output folder I have the following:

                                LOC_000010 - - XLOC_000010 GDH3 TSS15 I:31567-32941 - - 6.88081 0 17.4366 OK 0 0 0 OK 59.5503 32.1558 86.9447 OK 16.1091 0 39.818 OK 12.1495 0 33.8913 OK 8.74331 0 26.3261 OK
                                XLOC_000011 - - XLOC_000011 BDH2 TSS16 I:33448-34702 - - 7.16722 0 18.5636 OK 1318.72 1267.41 1370.04 OK 525.033 470.782 579.285 OK 16.9807 0 37.6977 OK 10.7936 0 26.9461 OK 7.9163 0 20.9762 OK
                                XLOC_000012 - - XLOC_000012 BDH1 TSS17 I:35155-36304 - - 8.65006 0 22.6596 OK 456.226 434.267 478.185 OK 812.57 739.725 885.414 OK 20.4021 0 49.5953 OK 16.3412 0 52.9038 OK 15.2207 0 37.045 OK
                                XLOC_000013 - - XLOC_000013 ECM1 TSS18 I:36496-37148 - - 302.668 243.291 362.045 OK 0 0 0 OK 127.871 38.9611 216.78 OK 9.66232 0 27.5273 OK 4.65527 0 14.9566 OK 5.77583 0 18.0408 OK

                                Where clearly the XLOC_** numbers are being linked with a gene_short_name. Furthermore, in my merged.gtf file I have the following:

                                I Cufflinks exon 31568 32941 . + . gene_id "XLOC_000010"; transcript_id "TCONS_00000015"; exon_number "1"; gene_name "GDH3"; oId "YAL062W"; nearest_ref "YAL062W"; class_code "="; tss_id "TSS15"; p_id "P11";
                                I Cufflinks exon 33449 34702 . + . gene_id "XLOC_000011"; transcript_id "TCONS_00000016"; exon_number "1"; gene_name "BDH2"; oId "YAL061W"; nearest_ref "YAL061W"; class_code "="; tss_id "TSS16"; p_id "P12";
                                I Cufflinks exon 35156 36304 . + . gene_id "XLOC_000012"; transcript_id "TCONS_00000017"; exon_number "1"; gene_name "BDH1"; oId "YAL060W"; nearest_ref "YAL060W"; class_code "="; tss_id "TSS17"; p_id "P13";
                                I Cufflinks exon 36510 37148 . + . gene_id "XLOC_000013"; transcript_id "TCONS_00000018"; exon_number "1"; gene_name "ECM1"; oId "YAL059W"; nearest_ref "YAL059W"; class_code "="; tss_id "TSS18"; p_id "P14";

                                I have upgraded my cummeRbund so that is most likely not the issue. Any suggestions are greatly appreciated!

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM
                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 06:37 PM
                                0 responses
                                8 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, Yesterday, 06:07 PM
                                0 responses
                                8 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-22-2024, 10:03 AM
                                0 responses
                                49 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-21-2024, 07:32 AM
                                0 responses
                                66 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X