Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with cuffdiff and cummeRbund?

    Hi all!

    Sorry to bother with a simple question- I have read through all the cummeRbund posts and tutorials but I seem to be stuck right at the start!

    I have ran RNA-seq analyses on galaxy online- tophat, cufflinks, cuffmerge, and cuffdiff. I would now like to visualize results in cummeRbund. I downloaded the cuffdiff files (11 each for two groups) off galaxy and they are in .tabular format. I installed cummeRbund, and ran the following. It does not work. Could the issue be that files should be in .db format? I don't know where the cuffData.db file came from- it appeared before I had even downloaded the cuffdiff files.

    > source("http://bioconductor.org/biocLite.R")
    > biocLite("cummeRbund")
    > getwd()
    > setwd("C:/Users/caetano1/Downloads/SEDENTARYDFF")
    > list.files()
    > library(cummeRbund)
    > cuff= readCufflinks (dbFile = "cuffData.db",
    + geneFPKM = "CuffdiffSEDENTARY__gene_FPKM_tracking.tabular",
    + geneDiff = "CuffdiffSEDENTARY__CDS_FPKM_differential_expression_testing.tabular",
    + isoformFPKM = "CuffdiffSEDENTARY__transcript_FPKM_tracking.tabular",
    + isoformDiff = "CuffdiffSEDENTARY__transcript_differential_expression_testing.tabular",
    + TSSFPKM = "CuffdiffSEDENTARY__TSS_groups_FPKM_tracking.tabular",
    + TSSDiff = "CuffdiffSEDENTARY__TSS_groups_differential_expression_testing.tabular",
    + CDSFPKM = "CuffdiffSEDENTARY__CDS_FPKM_tracking.tabular",
    + CDSExpDiff = "CuffdiffSEDENTARY__CDS_FPKM_differential_expression_testing.tabular"",
    + CDSDiff = "CuffdiffSEDENTARY__CDS_overloading_diffential_expression_testing.tabular",
    + promoterFile = "CuffdiffSEDENTARY__promoters_differential_expression_testing.tabular",
    + splicingFile = "CuffdiffSEDENTARY__splicing_differential_expression_testing.tabular",
    + rebuild = T)

    I think I'm missing something really obvious here!

    Thank you so much!

    Kelesy

  • #2
    Hi kelseyca,

    cuffData.db is the database file created by cummeRbund to store all the results from cuffdiff in a easy to access format for commands in cummeRbund in R.

    So if you run readCufflinks (dbFile = "cuffData.db",....) command even without loading all the files from cuffdiff into the directory, a default cuffData.db fill will be created.

    Hope this helps

    Thanks
    --
    Muthu

    Comment


    • #3
      Originally posted by muthu545 View Post
      Hi kelseyca,

      cuffData.db is the database file created by cummeRbund to store all the results from cuffdiff in a easy to access format for commands in cummeRbund in R.

      So if you run readCufflinks (dbFile = "cuffData.db",....) command even without loading all the files from cuffdiff into the directory, a default cuffData.db fill will be created.

      Hope this helps

      Thanks
      --
      Muthu
      Muthu,

      Thanks for your reply! So, how can I get R to read my cuffdiff files? Are they in the wrong format?

      Kelsey

      Comment


      • #4
        Output files should look like this:

        genes.read_group_tracking
        genes.fpkm_tracking
        genes.count_tracking
        gene_exp.diff

        I guess, you should delete ".tabular" part and organize them in the right format.

        Comment


        • #5
          Kelsey,

          As Sazz mentioned, the output files from cuffdiff will not have .tabular file formats.
          Please verify your output files from cuffdiff, if it doesnot match names provided in the readCufflinks command, then the files will not be recognized in R.

          Simple is to copy all the output files from cuffdiff into a directory and run the following command.

          cuff= readCufflinks (dbFile = "cuffData.db",dir="C:/Users/caetano1/Downloads/SEDENTARYDFF",
          gtfFile='DIRPATH/gtffile', genome='genomename',rebuild = T)

          This command recognizes all the files required to make the directory. You need not specify them individually.

          GTF file is needed for some visualization commands in cummeRbund.

          Hope this is helpful

          Thanks
          --
          Muthu

          Comment


          • #6
            Originally posted by muthu545 View Post
            Kelsey,

            As Sazz mentioned, the output files from cuffdiff will not have .tabular file formats.
            Please verify your output files from cuffdiff, if it doesnot match names provided in the readCufflinks command, then the files will not be recognized in R.

            Simple is to copy all the output files from cuffdiff into a directory and run the following command.

            cuff= readCufflinks (dbFile = "cuffData.db",dir="C:/Users/caetano1/Downloads/SEDENTARYDFF",
            gtfFile='DIRPATH/gtffile', genome='genomename',rebuild = T)

            This command recognizes all the files required to make the directory. You need not specify them individually.

            GTF file is needed for some visualization commands in cummeRbund.

            Hope this is helpful

            Thanks
            --
            Muthu
            Hi Muthu,

            One last question. Sorry If I am missing something very obvious here and wasting your time. thank you so much for being so patient and all of your help.

            I can not figure out how to export cuffdiff files from galaxy online in any other format than .tabular. I am just clicking "download" under the cuffdiff run. All manuals and FAQ's I have been reading are from running the tuxedo suite offline.

            Also, R cannot find the function "readCufflinks".

            Kelsey

            Comment


            • #7
              Originally posted by muthu545 View Post
              Kelsey,

              As Sazz mentioned, the output files from cuffdiff will not have .tabular file formats.
              Please verify your output files from cuffdiff, if it doesnot match names provided in the readCufflinks command, then the files will not be recognized in R.

              Simple is to copy all the output files from cuffdiff into a directory and run the following command.

              cuff= readCufflinks (dbFile = "cuffData.db",dir="C:/Users/caetano1/Downloads/SEDENTARYDFF",
              gtfFile='DIRPATH/gtffile', genome='genomename',rebuild = T)

              This command recognizes all the files required to make the directory. You need not specify them individually.

              GTF file is needed for some visualization commands in cummeRbund.

              Hope this is helpful

              Thanks
              --
              Muthu
              > cuff= readCufflinks (dbFile = "cuffData.db",dir="C:/Users/caetano1/Downloads/SEDENTARYDFF",
              + gtfFile='DIRPATH/gtffile', genome='genomename',rebuild = T)
              Creating database C:/Users/caetano1/Downloads/SEDENTARYDFF/cuffData.db
              Reading GTF file
              Error in import(FileForFormat(con), ...) :
              error in evaluating the argument 'con' in selecting a method for function 'import': Error in FileForFormat(con) : Format 'DIRPATH/gtffile' unsupported
              >

              Comment


              • #8
                Originally posted by kelseyca View Post
                Hi Muthu,

                One last question. Sorry If I am missing something very obvious here and wasting your time. thank you so much for being so patient and all of your help.

                I can not figure out how to export cuffdiff files from galaxy online in any other format than .tabular. I am just clicking "download" under the cuffdiff run. All manuals and FAQ's I have been reading are from running the tuxedo suite offline.

                Also, R cannot find the function "readCufflinks".

                Kelsey
                Hi Kelsey,

                Not a problem.

                If that's the case (Galaxy's output is .tabular), then you could rename the files in order to change the .tabular file format, after you download them.

                If R cannot find the functions 'readCufflinks', it means you did not load the corresponding library 'cummeRbund' in the current workspace.

                Thanks
                --
                Muthu

                Comment


                • #9
                  Originally posted by kelseyca View Post
                  > cuff= readCufflinks (dbFile = "cuffData.db",dir="C:/Users/caetano1/Downloads/SEDENTARYDFF",
                  + gtfFile='DIRPATH/gtffile', genome='genomename',rebuild = T)
                  Creating database C:/Users/caetano1/Downloads/SEDENTARYDFF/cuffData.db
                  Reading GTF file
                  Error in import(FileForFormat(con), ...) :
                  error in evaluating the argument 'con' in selecting a method for function 'import': Error in FileForFormat(con) : Format 'DIRPATH/gtffile' unsupported
                  >
                  Kelsey,

                  Rightnow, its throwing out error because its not able to detect the directory 'DIRPATH' and the gtf file.

                  I mentioned 'DIRPATH' in order to imply the directory in which you have the .gtf file you used to run cufflinks.
                  you could copy the XXX.gtf file to the same working directory 'C:/Users/caetano1/Downloads/SEDENTARYDFF' and then replace the DIRPATH/gtffile in the command to 'C:/Users/caetano1/Downloads/SEDENTARYDFF/XXX.gtf' and the 'genomename' to the name of the genome you are working with eg. 'hg19', 'hg18','pt03','mm9','mm10' etc...

                  Your readcufflinks command should work after this without any error.

                  thanks
                  --
                  Muthu

                  Comment


                  • #10
                    > source("http://bioconductor.org/biocLite.R")
                    Bioconductor version 2.12 (BiocInstaller 1.10.2), ?biocLite for help
                    > biocLite("cummeRbund")
                    BioC_mirror: http://bioconductor.org
                    Using Bioconductor version 2.12 (BiocInstaller 1.10.2), R version 3.0.1.
                    Installing package(s) 'cummeRbund'
                    trying URL 'http://bioconductor.org/packages/2.12/bioc/bin/windows/contrib/3.0/cummeRbund_2.2.0.zip'
                    Content type 'application/zip' length 2600163 bytes (2.5 Mb)
                    opened URL
                    downloaded 2.5 Mb

                    package ‘cummeRbund’ successfully unpacked and MD5 sums checked

                    The downloaded binary packages are in
                    C:\Users\caetano1\AppData\Local\Temp\RtmpQTqdVW\downloaded_packages
                    Warning message:
                    installed directory not writable, cannot update packages 'class', 'foreign',
                    'MASS', 'mgcv', 'nnet', 'spatial'
                    > getwd()
                    [1] "\\\\ansci-alpha/Homes/Grads/caetano1/Documents"
                    > setwd("C:/Users/caetano1/Downloads/SEDENTARYDFF")
                    > list.files()
                    [1] "cuffData.db"
                    [2] "CuffdiffSEDENTARY__CDS_FPKM_differential_expression_testing.tabular"
                    [3] "CuffdiffSEDENTARY__CDS_FPKM_tracking.tabular"
                    [4] "CuffdiffSEDENTARY__CDS_overloading_diffential_expression_testing.tabular"
                    [5] "CuffdiffSEDENTARY__gene_differential_expression_testing.tabular"
                    [6] "CuffdiffSEDENTARY__gene_FPKM_tracking.tabular"
                    [7] "CuffdiffSEDENTARY__promoters_differential_expression_testing.tabular"
                    [8] "CuffdiffSEDENTARY__splicing_differential_expression_testing.tabular"
                    [9] "CuffdiffSEDENTARY__transcript_differential_expression_testing.tabular"
                    [10] "CuffdiffSEDENTARY__transcript_FPKM_tracking.tabular"
                    [11] "CuffdiffSEDENTARY__TSS_groups_differential_expression_testing.tabular"
                    [12] "CuffdiffSEDENTARY__TSS_groups_FPKM_tracking.tabular"
                    [13] "mm10.gtf"
                    > library(cummeRbund)
                    Loading required package: BiocGenerics
                    Loading required package: parallel

                    Attaching package: ‘BiocGenerics’

                    The following objects are masked from ‘packagearallel’:

                    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
                    clusterExport, clusterMap, parApply, parCapply, parLapply,
                    parLapplyLB, parRapply, parSapply, parSapplyLB

                    The following object is masked from ‘package:stats’:

                    xtabs

                    The following objects are masked from ‘package:base’:

                    anyDuplicated, as.data.frame, cbind, colnames, duplicated, eval,
                    Filter, Find, get, intersect, lapply, Map, mapply, match, mget,
                    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
                    rbind, Reduce, rep.int, rownames, sapply, setdiff, sort, table,
                    tapply, union, unique, unlist

                    Loading required package: RSQLite
                    Loading required package: DBI
                    Loading required package: ggplot2
                    Loading required package: reshape2
                    Loading required package: fastcluster

                    Attaching package: ‘fastcluster’

                    The following object is masked from ‘package:stats’:

                    hclust

                    Loading required package: rtracklayer
                    Loading required package: GenomicRanges
                    Loading required package: IRanges
                    Loading required package: Gviz
                    Loading required package: grid

                    Attaching package: ‘cummeRbund’

                    The following object is masked from ‘package:GenomicRanges’:

                    promoters

                    The following object is masked from ‘package:IRanges’:

                    promoters

                    > cuff= readCufflinks (dbFile = "cuffData.db",dir="C:/Users/caetano1/Downloads/SEDENTARYDFF",
                    + gtfFile="C:/Users/caetano1/Downloads/SEDENTARYDFF/mm10.gtf", genome='mm10',rebuild = T)
                    Creating database C:/Users/caetano1/Downloads/SEDENTARYDFF/cuffData.db
                    Reading GTF file
                    Error in .parse_attrCol(attrCol, file, colnames) :
                    Some attributes do not conform to 'tag value' format
                    >

                    Comment


                    • #11
                      Please try simple this one.
                      Note: keep you "diff_out" folder within cuff_data folder
                      change directory to: cuff_data
                      > cuff_data<- readCufflinks('diff_out',rebuild=T)

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM
                      • seqadmin
                        Strategies for Sequencing Challenging Samples
                        by seqadmin


                        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                        03-22-2024, 06:39 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      23 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      24 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 09:21 AM
                      0 responses
                      21 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-04-2024, 09:00 AM
                      0 responses
                      52 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X