Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GSEA: None of the gene sets passed the size thresholds

    I am trying to use GSEA for some RNA-Seq data. I've used it previously for microarray data and it worked fine. My guess is that using well-defined probe set IDs helped.

    For my data, I tried using the provided gene symbols as well as symbols exactly matching what I have. It seems no matter what I do, I end up with "None of the gene sets passed the size thresholds" error.

    Using GenePattern (which I assume should be the safest option), I get the following output:
    Code:
    1286 [INFO ] Begun importing: Chip from: /xchip/gpprod-upload/servers/genepattern/users/uploads/tmp/run4907393233174326312.tmp/chip.platform.file/1/same.chip
    1334 [WARN ] Missing chip file: >/xchip/gpprod-upload/servers/genepattern/users/uploads/tmp/run4907393233174326312.tmp/chip.platform.file/1/same.chip<	at edu.mit.broad.vdb.chip.FileInMemoryChip.initHere(?:?)
    1502 [INFO ] Parsed from dotchip : 21862
    1350 [WARN ] Missing chip file: >/xchip/gpprod-upload/servers/genepattern/users/uploads/tmp/run4907393233174326312.tmp/chip.platform.file/1/same.chip<	at edu.mit.broad.vdb.chip.FileInMemoryChip.initHere(?:?)
    1738 [INFO ] Collapsing dataset was done. Original: 21862x2 (ann: 21862,2,same.chip) collapsed: 860x2 (ann: 860,2,GENE_SYMBOL)
    to parse>c5.all.v4.0.symbols.gmt< got: [c5.all.v4.0.symbols.gmt]
    1763 [INFO ] Begun importing: GeneSetMatrix from: c5.all.v4.0.symbols.gmt
    2110 [INFO ] Got gsets: 1454 now preprocessing them ... min: 3 max: 500
    Done removeGeneSetsSmallerThan: 3 for: 501 / 1454
    Done removeGeneSetsSmallerThan: 3 for: 1001 / 1454
    2259 [INFO ] Done preproc for smaller than: 3
    2428 [INFO ] Renaming rpt dir on error to: error_.
    2276 [WARN ] Could not rename for error to: error_.	at edu.mit.broad.genome.reports.api.ToolReport.setErroredOut(?:?)
    Those warnings look questionable, but they are not exactly informative. Why would it say "Missing chip file" when the chip file is obviously present?

  • #2
    hi!

    i just started encountering the same error. ('Renaming rpt dir on error to: error_.')
    differently from you i'm still working on some microarray data, and i also tried using collapse option and also symbols that mach exactly my dataset probes.

    were you able to resolve the problem? could anyone have some advice?

    thanks!

    p.s. i'm sure that threshold is well above number of the genes in my geneset.
    Last edited by snaporaz; 10-23-2014, 11:35 AM. Reason: clarification

    Comment


    • #3
      I haven't used the Broads GSEA in a while, but for RNAseq data you could try a Bioconductor package:
      The package generally provides methods for gene set enrichment analysis of high-throughput RNA-Seq data by integrating differential expression and splicing. It uses negative binomial distribution to model read count data, which accounts for sequencing biases and biological variation. Based on permutation tests, statistical significance can also be achieved regarding each gene's differential expression and splicing, respectively.

      Comment


      • #4
        Set 'Collapse dataset to gene symbols' false -- might help.
        Last edited by jamesjcai; 03-08-2017, 08:51 AM.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Today, 08:47 AM
        0 responses
        12 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        60 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        59 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        54 views
        0 likes
        Last Post seqadmin  
        Working...
        X