SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
ChIP-Seq: An integrated ChIP-seq analysis platform with customizable workflows. Newsbot! Literature Watch 0 07-09-2011 02:10 AM
ChIP-Seq: CASSys: an integrated software-system for the interactive analysis of ChIP- Newsbot! Literature Watch 0 06-29-2011 01:10 PM
PubMed: Using CisGenome to Analyze ChIP-chip and ChIP-seq Data. Newsbot! Literature Watch 0 03-15-2011 02:00 AM
Using CisGenome for ChIP-seq analysis Beginner Bioinformatics 2 04-19-2010 07:23 AM
PubMed: An integrated software system for analyzing ChIP-chip and ChIP-seq data. Newsbot! Literature Watch 0 11-04-2008 05:03 AM

Reply
 
Thread Tools
Old 11-11-2008, 11:00 AM   #1
hji
Member
 
Location: Baltimore

Join Date: Nov 2008
Posts: 13
Smile CisGenome -- an integrated tool for ChIP-seq data analysis

I just found this great website. I would like say thank you to the administrator(s) as you provided a really useful resource for next-gen seq community.

I want to introduce to the community a tool we have developed for ChIP-seq data analysis. The tool is called CisGenome and can be downloaded from http://www.biostat.jhsph.edu/~hji/cisgenome/. The paper describing the tool is published in this month's Nature Biotechnology, Ji et al., 2008, 26:1293 - 1300.


I realized that ECO has already included CisGenome into the ChIP-seq software lists (thanks!). What I want to do here is to highlight several critical features of CisGenome.

1. New statistics:

When a ChIP-seq experiment involves only ChIP'd sample but not control samples, we developed a truncated negative binomial model to estimate false discovery rate (FDR). Most existing algorithms for handling this type of data use Poisson or Monte Carlo simulation to provide the background model, which has the underlying assumption that read (tag) sampling rate is a constant across genome. Our own experience shows that this is a poor assumption and in most cases will lead to overstating the statistical significance. The negative binomial model we used in CisGenome provides a simple but much better model to describe the variations of read sampling rate across the genome. Also, it does not require users to provide an ad hoc number for the "fraction of alignible genome".

When the ChIP-seq experiment involves both ChIP'd sample and negative control sample, we use a conditional binomial model to detect peaks. The model automatically takes into account the difference between the total number of reads in the ChIP sample and the number of reads in the control sample. In other words, normalization is done naturally by the statistical model. To estimate false discovery rate, our model does NOT require that the number of ChIP reads matches the number of control reads (i.e. it is fine to have 2 million ChIP reads and 1 million control reads, or 1 million ChIP reads vs. 2 million control reads). As a comparison, some previous methods compute FDR by switching the ChIP & control labels, these type of methods usually require you to have approx. the same number of ChIP & control reads. Some other methods like QuEST compares two negative controls to get an FDR estimate, but in order to do so, you have to double your control reads in the experiments (i.e., to compute FDR for a comparison between 1 million ChIP reads and 1 million control reads, you need to have another 1 million control reads. You estimate FDR by comparing control vs control).

Finally, many existing tools provide p-values instead of FDR. It is well known that p-value is not a good error rate measure to use in the context of multiple testing. CisGenome provides FDR estimates instead of p-values for both one-sample (only ChIP'd sample is available) and two-sample (both ChIP'd and control samples are available) ChIP-seq analyses.

2. Graphic user interface & visualization

If you don't have programming experience, we have a graphic user interface designed for you. If you are an experienced programmer, you can always use our core functions as a command line program (i.e., you can easily incorporate them into your shell files and prepare batch jobs).

In addition to the GUI, we have a CisGenome browser (pretty much like UCSC browser but with fewer functions). The browser runs locally on your computer, and you can visualize raw data and peak signals in the browser. In the same browser, you can also visualize gene structures, cross-species conservation, DNA sequences, motif logos, etc. You can also add custom tracks. Remember, this is a light-weight browser running on your own computers, you don't need to upload anything to web servers (like what you will do in order to use UCSC). It is a tool designed to save some time in large-scale interactive analyses, since it avoids uploading large data sets to webservers.

3. Motif analysis, gene annotation, sequence retrival, etc.

ChIP-seq peak detection is not the only function of CisGenome. Indeed, you can use CisGenome to do a bunch of downstream analyses including de novo motif discovery, mapping motif to the genome or any set of genomic regions, adding gene annotations, retrieving DNA sequences, get summary statistics about distributions of your peaks (i.e. x% are in exon, y% are in 1kb promoter, etc.). You can also use CisGenome to analyze ChIP-chip data.

Of course, any software will have bugs. We are not surprised if you encounter bugs in CisGenome. When you find bugs, just kindly let us know. We will try to fix them. We hope that you will find CisGenome useful in your own work.
hji is offline   Reply With Quote
Old 11-11-2008, 01:15 PM   #2
xuer
Member
 
Location: germany

Join Date: Sep 2008
Posts: 17
Default

I have tested cis genome browser. Though I have not use all of its function. It looks quite good!
xuer is offline   Reply With Quote
Old 12-08-2008, 10:41 AM   #3
frankyue50
Member
 
Location: CA

Join Date: Nov 2008
Posts: 34
Default

Sounds really promising. I'll check it out.
frankyue50 is offline   Reply With Quote
Old 12-08-2008, 08:57 PM   #4
erikarner
Junior Member
 
Location: Japan

Join Date: Oct 2008
Posts: 1
Default

Hi hji,

I saw the article in NBT the other day and it certainly looks really useful. I have a few questions, if you don't mind.

1. Does the peak detection algorithm in ChIP-seq adjust for variable number of potential single mapping sites in different regions? I am assuming that the algorithm only uses uniquely mapping reads. A few tags in a region mostly consisting of repeats can be more significant than many tags in a unique region - is this accounted for?

2. My understanding is that the GUI is only available for Windows. Is all functionality available in the Linux version, and can analysis results obtained on the Linux platform be tranferred to a windows computer for viewing and further analys? I guess what I'm asking is how decoupled the GUI is from core functionality, file formats etc.

Best regards,
Erik
erikarner is offline   Reply With Quote
Old 12-12-2008, 05:47 AM   #5
hji
Member
 
Location: Baltimore

Join Date: Nov 2008
Posts: 13
Default

Erik,

Re your first question: "Does the peak detection algorithm in ChIP-seq adjust for variable number of potential single mapping sites in different regions? I am assuming that the algorithm only uses uniquely mapping reads. A few tags in a region mostly consisting of repeats can be more significant than many tags in a unique region - is this accounted for?"

If you are using two sample analysis, this is automatically adjusted for. Since the same bias should apply for ChIP'd and control sample. (correct me if I'm not right).

If you are using one sample analysis, the answer is no, we haven't adjusted for it in the current version. You raised a very good point, and we will try to incorporate this into our next version of peak detection algorithm if that tests well.

Re your second question: "My understanding is that the GUI is only available for Windows. Is all functionality available in the Linux version, and can analysis results obtained on the Linux platform be tranferred to a windows computer for viewing and further analys? I guess what I'm asking is how decoupled the GUI is from core functionality, file formats etc."

You are right, the GUI is currently only for windows. But all core algoritms can be run on Linux. The window GUI use the same core algorithms as the Linux version and yields the same results in the same formats. So you can transfer results from Linux to a windows machine and perform further analysis from there.
hji is offline   Reply With Quote
Old 02-09-2009, 12:31 PM   #6
What_Da_Seq
Member
 
Location: RTP

Join Date: Jul 2008
Posts: 28
Default

Any suggestions for using CisGenome for MeDIP-CHIP without input controls (only treated vs. notTreated)? I am still waiting for the normalization of my 63 .cel files to finish. I therefore have not had a chance to explore the TileMap interface. Any suggestion for starting conditions are appreciated.

Frank
What_Da_Seq is offline   Reply With Quote
Old 02-12-2009, 09:22 AM   #7
hji
Member
 
Location: Baltimore

Join Date: Nov 2008
Posts: 13
Default

I'm not quite sure how your data structure is, but it looks like a typical two-sample comparison should work.
hji is offline   Reply With Quote
Old 02-12-2009, 12:02 PM   #8
What_Da_Seq
Member
 
Location: RTP

Join Date: Jul 2008
Posts: 28
Default

Sorry I did not make this clearer. Now that I have done a couple analyses I can tell you that I am not getting any peaks using HMM and 2 samples when comparing (treatment > control) and only like 20 peaks for (control > treatment). I have not used the UMS settings yet.
I was just wondering since I am looking for single base events (CpG or MeCpG) and not TF binding what would be my most relaxed (least stringent) HMM setting for peak detection. I can identify 3000+ regions via MA(300) for (treatment > control) but only 5 of these regions are FDR 0.0000000 and the next group of peaks is 0.10000000.
I also have no good grasp on why the FDR numbers in the COD files are grouped instead of continuous (eg. 5 peaks FDR=0.0000000, next peak group at 0.1000000).

I greatly appreciate your input. I am just trying to work my way through the 2005 TILEMAP paper. If only my statistical comprehension would be better. But the program so far is very nice especially since my boss always wanted some sort of FDR calculations incorporated into tiling analysis.

Thanks again
What_Da_Seq is offline   Reply With Quote
Old 02-12-2009, 05:45 PM   #9
hji
Member
 
Location: Baltimore

Join Date: Nov 2008
Posts: 13
Default

In that case, I suggest you look at the raw data first. You can import the fc.bar and ma.bar into CisGenome browser and look at the top peaks. Ask yourself the question: do they look like something real? This will help you understand whether the FDR make sense or not.

Regarding why FDR are always grouped: because the FDR is forced to be monotone. Your peaks are ranked, the raw FDR is computed as (# peaks in the left tail)/(# peaks in the right tail). Suppose the raw FDR is: 0.01; 0.02; 0.00; 0.06; 0.05; 0.07 ... then the reported FDR will be 0.00; 0.00; 0.00; 0.05; 0.05; 0.07 ... This is somewhat like the Benjamini-Hochberg procedure.
hji is offline   Reply With Quote
Old 04-21-2009, 09:16 AM   #10
tfcheng
Junior Member
 
Location: MD

Join Date: Apr 2009
Posts: 3
Default

Hi HJI,
I am trying to analyze my chip-seq results, I am hoping that CisGenome can help me. I have two sets of data, experimental and control, both in WIG and BED formats. I need to know the difference between the two. Being a rookie in chip-seq field, do you mind telling me if CisGenome is the right tool for me? and if so, how should I use it? thank you!!
tfcheng is offline   Reply With Quote
Old 04-22-2009, 01:35 PM   #11
hji
Member
 
Location: Baltimore

Join Date: Nov 2008
Posts: 13
Default

I just added a function to convert BED file to ALN file. You can then use the ALN file to detect peaks and perform subsequent analysis. You are certainly welcome to try CisGenome.

BTW, we have also added support for C elegans, Yeast and Chicken recently.
hji is offline   Reply With Quote
Old 05-21-2009, 04:46 PM   #12
schandri
Junior Member
 
Location: San Francisco

Join Date: May 2009
Posts: 3
Exclamation cisGenome trouble shooting?

Hello,

I am trying to use cisGenome to "find closest gene" to TF binding sites identified using ChIP-Seq. I have downloaded the human genome database (hg18) and have converted the enriched sites into the COD file format. I was able to load the genome datase and COD file into the cisGenome browser. Then I choose “Genome > Annotate with … > Closest Gene”. From here I indicate a save to location and hit "OK". There is a new window that flashes (too fast for me to read) and then there is no file saved or further COD added to the project. I don't know what I am doing wrong. I would be EXTREMELY grateful for any advice.

Best regards,
Sanjay
schandri is offline   Reply With Quote
Old 05-21-2009, 06:32 PM   #13
hji
Member
 
Location: Baltimore

Join Date: Nov 2008
Posts: 13
Default schandri

First, check whether you have set the CisGenome.ini file. In that file, you should give the CisGenome installation path.

Second, check whether any of your folder or file path/names contains blank characters such as "C:\My Document\". If so, move (or rename) your data to folders that do not contain blank characters. CisGenome should also be installed in a folder that does not contain blank characters.

Try and see if this solves your problem.
hji is offline   Reply With Quote
Old 05-22-2009, 09:30 AM   #14
schandri
Junior Member
 
Location: San Francisco

Join Date: May 2009
Posts: 3
Smile

Thanks for your post, hji. Your suggestions fixed the problem! I had installed cisGenome in a path that did not have any spaces, but had made two other mistakes. First, the path in the .ini file was slightly off and second, my .COD data file was in a location that had a file path containing spaces. Now it seems to be working great!

Thanks again.
Sanjay
schandri is offline   Reply With Quote
Old 08-31-2009, 08:37 PM   #15
seidel
Junior Member
 
Location: Missouri

Join Date: Mar 2008
Posts: 3
Default convert bar to wig

Anyone know of a utility to convert .bar files to .wig files?

I'd be happy to write a program to do it - but any pointers for the .bar format would be helpful. I'm sure I'm not the only one who would be interested in seeing cisGenome output in the UCSC genome browser (which doesn't read .bar last I checked).
seidel is offline   Reply With Quote
Old 09-07-2009, 05:12 AM   #16
tebuffer
Member
 
Location: Bethesda, USA

Join Date: Jun 2009
Posts: 13
Default motif files created but not showing up in project menu

In Windows, I load the genome database, genome region and coordinates and run "Annotate with ->Closest Gene". I get the output listed under the Project Explorer. Perfect so far.

When I run, "Get sequence" , the fasta file is generated, but it does not show up in the Project Explorer. Likewise, when I run the "New Motif Discovery" with the previously generated FASTA sequence file, motifs are discovered and stored in the location that I specified, but they do not appear under the "Project Explorer".

Would appreciate if anyone could point me to getting this correct.

Thanks,
TEB
tebuffer is offline   Reply With Quote
Old 09-07-2009, 10:22 PM   #17
hon
Junior Member
 
Location: san fran

Join Date: Sep 2009
Posts: 9
Default

How to convert eland_extended.txt format to Aln format in order to use the Cisgenome program?
hon is offline   Reply With Quote
Old 11-20-2009, 05:53 AM   #18
zhlyang
Junior Member
 
Location: md

Join Date: Nov 2009
Posts: 7
Smile Genome database file not listed on the website

Hi Hji, I am new in the NGS analysis and is going to try the Cisgenome. I need to use a small viral genome which is not listed on the Cisgenome genome database on the website. Would you mind to let me know the way to convert a Genbank genome to a Cisgenome database file?Thank you.
zhlyang is offline   Reply With Quote
Old 01-14-2010, 07:32 PM   #19
ljul
Junior Member
 
Location: Ottawa

Join Date: Jan 2010
Posts: 1
Default

Hi all, I'm new here and am hoping someone may be able to help me out with a little Cisgenome issue I'm having. I've used Cisgenome successfully on a PC running Windows XP, but I'm now trying to run the program on my MacBook Pro using the Parallels program, which allows me to run Windows XP. I'm able to open the program through XP on my MAC and can load files as well, but when I try to run an application (right now I'm attempting Gibbs sampling) I get the quickly flashing black window that then disappears, and the program won't continue running. I have checked the file paths and file names, and there are no blank characters included anywhere. Also, the cisgenome.ini file does specify the correct Cisgenome installation path, so I'm at a loss as to what the problem is. Does anyone have experience running Cisgenome on Windows XP through MAC Parallels? Or perhaps it isn't possible to run the program this way. Thanks in advance, any assistance will be greatly appreciated.
ljul is offline   Reply With Quote
Old 02-10-2010, 06:35 AM   #20
tec
Member
 
Location: germany

Join Date: Apr 2008
Posts: 14
Unhappy Cisgenome Browser - file doesn't exist

Hello,

i have a problem concerning the cisgenome browser and the visualization of already analyzed ChIP-Seq data through the Linux version of Cisgenome.
When i want to visualize *.bar, *.genefile, *.cod, .... files, i always get a message - file doesn't exist!

When i call peaks with the Windows version of cisgenome and then doubleclick on a peak - the browser opens and the data is showing in the browser. But adding additional datafiles is not possible.. - file doesn't exist..!?

Are there any helpful suggestions???

Thanks! tec
tec is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:36 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO