Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • hji
    Member
    • Nov 2008
    • 13

    CisGenome -- an integrated tool for ChIP-seq data analysis

    I just found this great website. I would like say thank you to the administrator(s) as you provided a really useful resource for next-gen seq community.

    I want to introduce to the community a tool we have developed for ChIP-seq data analysis. The tool is called CisGenome and can be downloaded from http://www.biostat.jhsph.edu/~hji/cisgenome/. The paper describing the tool is published in this month's Nature Biotechnology, Ji et al., 2008, 26:1293 - 1300.


    I realized that ECO has already included CisGenome into the ChIP-seq software lists (thanks!). What I want to do here is to highlight several critical features of CisGenome.

    1. New statistics:

    When a ChIP-seq experiment involves only ChIP'd sample but not control samples, we developed a truncated negative binomial model to estimate false discovery rate (FDR). Most existing algorithms for handling this type of data use Poisson or Monte Carlo simulation to provide the background model, which has the underlying assumption that read (tag) sampling rate is a constant across genome. Our own experience shows that this is a poor assumption and in most cases will lead to overstating the statistical significance. The negative binomial model we used in CisGenome provides a simple but much better model to describe the variations of read sampling rate across the genome. Also, it does not require users to provide an ad hoc number for the "fraction of alignible genome".

    When the ChIP-seq experiment involves both ChIP'd sample and negative control sample, we use a conditional binomial model to detect peaks. The model automatically takes into account the difference between the total number of reads in the ChIP sample and the number of reads in the control sample. In other words, normalization is done naturally by the statistical model. To estimate false discovery rate, our model does NOT require that the number of ChIP reads matches the number of control reads (i.e. it is fine to have 2 million ChIP reads and 1 million control reads, or 1 million ChIP reads vs. 2 million control reads). As a comparison, some previous methods compute FDR by switching the ChIP & control labels, these type of methods usually require you to have approx. the same number of ChIP & control reads. Some other methods like QuEST compares two negative controls to get an FDR estimate, but in order to do so, you have to double your control reads in the experiments (i.e., to compute FDR for a comparison between 1 million ChIP reads and 1 million control reads, you need to have another 1 million control reads. You estimate FDR by comparing control vs control).

    Finally, many existing tools provide p-values instead of FDR. It is well known that p-value is not a good error rate measure to use in the context of multiple testing. CisGenome provides FDR estimates instead of p-values for both one-sample (only ChIP'd sample is available) and two-sample (both ChIP'd and control samples are available) ChIP-seq analyses.

    2. Graphic user interface & visualization

    If you don't have programming experience, we have a graphic user interface designed for you. If you are an experienced programmer, you can always use our core functions as a command line program (i.e., you can easily incorporate them into your shell files and prepare batch jobs).

    In addition to the GUI, we have a CisGenome browser (pretty much like UCSC browser but with fewer functions). The browser runs locally on your computer, and you can visualize raw data and peak signals in the browser. In the same browser, you can also visualize gene structures, cross-species conservation, DNA sequences, motif logos, etc. You can also add custom tracks. Remember, this is a light-weight browser running on your own computers, you don't need to upload anything to web servers (like what you will do in order to use UCSC). It is a tool designed to save some time in large-scale interactive analyses, since it avoids uploading large data sets to webservers.

    3. Motif analysis, gene annotation, sequence retrival, etc.

    ChIP-seq peak detection is not the only function of CisGenome. Indeed, you can use CisGenome to do a bunch of downstream analyses including de novo motif discovery, mapping motif to the genome or any set of genomic regions, adding gene annotations, retrieving DNA sequences, get summary statistics about distributions of your peaks (i.e. x% are in exon, y% are in 1kb promoter, etc.). You can also use CisGenome to analyze ChIP-chip data.

    Of course, any software will have bugs. We are not surprised if you encounter bugs in CisGenome. When you find bugs, just kindly let us know. We will try to fix them. We hope that you will find CisGenome useful in your own work.
  • xuer
    Member
    • Sep 2008
    • 17

    #2
    I have tested cis genome browser. Though I have not use all of its function. It looks quite good!

    Comment

    • frankyue50
      Member
      • Nov 2008
      • 34

      #3
      Sounds really promising. I'll check it out.

      Comment

      • erikarner
        Junior Member
        • Oct 2008
        • 1

        #4
        Hi hji,

        I saw the article in NBT the other day and it certainly looks really useful. I have a few questions, if you don't mind.

        1. Does the peak detection algorithm in ChIP-seq adjust for variable number of potential single mapping sites in different regions? I am assuming that the algorithm only uses uniquely mapping reads. A few tags in a region mostly consisting of repeats can be more significant than many tags in a unique region - is this accounted for?

        2. My understanding is that the GUI is only available for Windows. Is all functionality available in the Linux version, and can analysis results obtained on the Linux platform be tranferred to a windows computer for viewing and further analys? I guess what I'm asking is how decoupled the GUI is from core functionality, file formats etc.

        Best regards,
        Erik

        Comment

        • hji
          Member
          • Nov 2008
          • 13

          #5
          Erik,

          Re your first question: "Does the peak detection algorithm in ChIP-seq adjust for variable number of potential single mapping sites in different regions? I am assuming that the algorithm only uses uniquely mapping reads. A few tags in a region mostly consisting of repeats can be more significant than many tags in a unique region - is this accounted for?"

          If you are using two sample analysis, this is automatically adjusted for. Since the same bias should apply for ChIP'd and control sample. (correct me if I'm not right).

          If you are using one sample analysis, the answer is no, we haven't adjusted for it in the current version. You raised a very good point, and we will try to incorporate this into our next version of peak detection algorithm if that tests well.

          Re your second question: "My understanding is that the GUI is only available for Windows. Is all functionality available in the Linux version, and can analysis results obtained on the Linux platform be tranferred to a windows computer for viewing and further analys? I guess what I'm asking is how decoupled the GUI is from core functionality, file formats etc."

          You are right, the GUI is currently only for windows. But all core algoritms can be run on Linux. The window GUI use the same core algorithms as the Linux version and yields the same results in the same formats. So you can transfer results from Linux to a windows machine and perform further analysis from there.

          Comment

          • What_Da_Seq
            Member
            • Jul 2008
            • 28

            #6
            Any suggestions for using CisGenome for MeDIP-CHIP without input controls (only treated vs. notTreated)? I am still waiting for the normalization of my 63 .cel files to finish. I therefore have not had a chance to explore the TileMap interface. Any suggestion for starting conditions are appreciated.

            Frank

            Comment

            • hji
              Member
              • Nov 2008
              • 13

              #7
              I'm not quite sure how your data structure is, but it looks like a typical two-sample comparison should work.

              Comment

              • What_Da_Seq
                Member
                • Jul 2008
                • 28

                #8
                Sorry I did not make this clearer. Now that I have done a couple analyses I can tell you that I am not getting any peaks using HMM and 2 samples when comparing (treatment > control) and only like 20 peaks for (control > treatment). I have not used the UMS settings yet.
                I was just wondering since I am looking for single base events (CpG or MeCpG) and not TF binding what would be my most relaxed (least stringent) HMM setting for peak detection. I can identify 3000+ regions via MA(300) for (treatment > control) but only 5 of these regions are FDR 0.0000000 and the next group of peaks is 0.10000000.
                I also have no good grasp on why the FDR numbers in the COD files are grouped instead of continuous (eg. 5 peaks FDR=0.0000000, next peak group at 0.1000000).

                I greatly appreciate your input. I am just trying to work my way through the 2005 TILEMAP paper. If only my statistical comprehension would be better. But the program so far is very nice especially since my boss always wanted some sort of FDR calculations incorporated into tiling analysis.

                Thanks again

                Comment

                • hji
                  Member
                  • Nov 2008
                  • 13

                  #9
                  In that case, I suggest you look at the raw data first. You can import the fc.bar and ma.bar into CisGenome browser and look at the top peaks. Ask yourself the question: do they look like something real? This will help you understand whether the FDR make sense or not.

                  Regarding why FDR are always grouped: because the FDR is forced to be monotone. Your peaks are ranked, the raw FDR is computed as (# peaks in the left tail)/(# peaks in the right tail). Suppose the raw FDR is: 0.01; 0.02; 0.00; 0.06; 0.05; 0.07 ... then the reported FDR will be 0.00; 0.00; 0.00; 0.05; 0.05; 0.07 ... This is somewhat like the Benjamini-Hochberg procedure.

                  Comment

                  • tfcheng
                    Junior Member
                    • Apr 2009
                    • 3

                    #10
                    Hi HJI,
                    I am trying to analyze my chip-seq results, I am hoping that CisGenome can help me. I have two sets of data, experimental and control, both in WIG and BED formats. I need to know the difference between the two. Being a rookie in chip-seq field, do you mind telling me if CisGenome is the right tool for me? and if so, how should I use it? thank you!!

                    Comment

                    • hji
                      Member
                      • Nov 2008
                      • 13

                      #11
                      I just added a function to convert BED file to ALN file. You can then use the ALN file to detect peaks and perform subsequent analysis. You are certainly welcome to try CisGenome.

                      BTW, we have also added support for C elegans, Yeast and Chicken recently.

                      Comment

                      • schandri
                        Junior Member
                        • May 2009
                        • 3

                        #12
                        cisGenome trouble shooting?

                        Hello,

                        I am trying to use cisGenome to "find closest gene" to TF binding sites identified using ChIP-Seq. I have downloaded the human genome database (hg18) and have converted the enriched sites into the COD file format. I was able to load the genome datase and COD file into the cisGenome browser. Then I choose “Genome > Annotate with … > Closest Gene”. From here I indicate a save to location and hit "OK". There is a new window that flashes (too fast for me to read) and then there is no file saved or further COD added to the project. I don't know what I am doing wrong. I would be EXTREMELY grateful for any advice.

                        Best regards,
                        Sanjay

                        Comment

                        • hji
                          Member
                          • Nov 2008
                          • 13

                          #13
                          schandri

                          First, check whether you have set the CisGenome.ini file. In that file, you should give the CisGenome installation path.

                          Second, check whether any of your folder or file path/names contains blank characters such as "C:\My Document\". If so, move (or rename) your data to folders that do not contain blank characters. CisGenome should also be installed in a folder that does not contain blank characters.

                          Try and see if this solves your problem.

                          Comment

                          • schandri
                            Junior Member
                            • May 2009
                            • 3

                            #14
                            Thanks for your post, hji. Your suggestions fixed the problem! I had installed cisGenome in a path that did not have any spaces, but had made two other mistakes. First, the path in the .ini file was slightly off and second, my .COD data file was in a location that had a file path containing spaces. Now it seems to be working great!

                            Thanks again.
                            Sanjay

                            Comment

                            • seidel
                              Junior Member
                              • Mar 2008
                              • 3

                              #15
                              convert bar to wig

                              Anyone know of a utility to convert .bar files to .wig files?

                              I'd be happy to write a program to do it - but any pointers for the .bar format would be helpful. I'm sure I'm not the only one who would be interested in seeing cisGenome output in the UCSC genome browser (which doesn't read .bar last I checked).

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM
                              • SEQadmin2
                                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                by SEQadmin2

                                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                05-06-2026, 09:04 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              19 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 11:40 AM
                              0 responses
                              14 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-28-2026, 11:40 AM
                              0 responses
                              29 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-26-2026, 10:12 AM
                              0 responses
                              31 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...