![]() |
Erange
Dear all,
I'm trying to get the Ali Mortazavi and Wold lab software package up and running for analyzing mRNA-Seq data. Its been a struggle and wondering if anyone has some tips for getting this going. Many thanks in advance. John |
was struggling with getting it to run recently as well, but i think i finally got it running now. at which part of the installation do you get stuck?
|
Also struggling with ERANGE...
Not sure if this thread is still going, but i'll give it a shot.
I'm also trying to get ERANGE (v2.1) off the ground and having some major problems. I'm running through the shell on Mac OS X 10.5.6 with Python 2.5.1. I installed all of the necessary prereqs, including Cistematic as per the instructions on the Wold lab site. For starters, I've been trying to test ERANGE on the Wold Liver sample dataset. I get stuck right at the beginning: python geneMrnacounts.py mouse proj/genome/SAMPLEDATA/mm9Liver1.uniqs.bed mm9Liver1.uniqs.count mm9Liver1.nomatch.bed geneMrnacounts.py: version 3.3 Traceback (most recent call last): File "geneMrnacounts.py", line 16, in <module> from cistematic.genomes import Genome ImportError: No module named cistematic.genomes ..$ python Python 2.5.1 (r251:54863, Jul 23 2008, 11:00:16) [GCC 4.0.1 (Apple Inc. build 5465)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> It seems to be missing this "cistematic.genomes" module if I try running other scripts as well. I've had some very talented python folks take a look and they're stumped as well. Any advice on this? Has anyone run into similar problems? Any help would be VERY MUCH appreciated. THANKS! |
Re: Also struggling with ERANGE...
Hi,
The error message is about not finding cistematic (cistematic.caltech.edu), which is a required package to use ERANGE for RNA-seq ! If you've already downloaded cistematic (and the appropriate genome directories e.g. H_sapiens or M_Musculus) and saved them into a directory such as /my/favorite/dir, then simply set (assuming bash syntax) export PYTHONPATH=/my/favorite/dir export CISTEMATIC_ROOT=/my/favorite/dir I hope this helps! Ali |
Hi ALL,
I want to do some exercise of chip-seq and chip-chip analysis. For this analysis I want to use WOLD lab's ERANGE. But, I am not finding the way to use it properly. If anybody has tried the ERANGE. before. please advice me the away to start with it. Thanks in advance, ~Vivek |
Hi All,
I am new to chip-seq analysis and ERANGE:confused:. I am trying to run findall.py script as "python /ERANGEPATH/findall.py PEAK unique.rds unique.region.txt -listPeak -revbackground" where unique.rds is input rds file and unique.region.txt is the output region file. script is running for very long(20+ hrs) and then exiting with an error as ------------------------------------------ Traceback (most recent call last): File "/root/Desktop/genotypic/commoncode/findall.py", line 397, in <module> hitDict = mockRDS.getReadsDict(fullChrom=True, chrom=achrom, withWeight=True, doMulti=useMulti, findallOptimize=True) NameError: name 'mockRDS' is not defined ____________________________________- Does anybody have an experience with the similar error ?:confused: It will be great if anybody can suggest me the way to get rid of this error, and other precaution to run the script without error.:) With Thanks,:):):) Vivek |
ERANGE performance, etc...
I can see how my rather sparse documentation could lead people astray.
Performance-wise, you need to make sure that you have 3 things under control: 1) You need to allocate as much cache as possible to your rds file as possible. This is a sqlite parameter that needs to be set once per file, but can be overidden in most of the script. It should be at about 2/3 the max amount of RAM that you want to use. If you have 2-4 Gb, a value of 1 million would be appropriate. You can set it up with the following command: python $ERANGEPATH/rdsmetadata.py myfavorite.rds -defaultcache 1000000 2) You need to make sure that your RDS file is indexed (if findall.py tells you that the file is not indexed, just control-C and fix it). If you forgot to do so when loading the last lane, you can force it with the following command: python $ERANGEPATH/rdsmetadata.py myfavorite.rds -index you could have combined 1 & 2 in one command, i.e. python $ERANGEPATH/rdsmetadata.py myfavorite.rds -index -defaultcache 1000000 3) sqlite can be *unbearably* slow over NFS. If you cannot store the RDS file on a local drive, then you need to force local caching. For ChIP-seq, it means explicitly giving the "-cache someValue" option to trigger copying to a local temp drive (/tmp by default, but it can be redirected to anywhere pointed to by the environmental variable CISTEMATIC_TEMP). If someValue is below the defaultcache size, it will ignore the value but still copy locally. For RNA-seq, you really should do #1, #2, and use the shell script runStandardAnalysisNFS.sh (or at least use the command line arguments in there) Fyi, if everything is optimal, findall.py for ChIP-seq should be done within 30 minutes max, and an RNA-seq analysis should take from a couple of hours to overnight, depending on the size of the dataset. For vix_z, you should run findall.py without the -revbackground option, since you never did specify a control file, e.g. python $ERANGEPATH/findall.py PEAK unique.rds unique.region.txt -listPeak -cache 1000000 if you have a background (aka "control") library, then you can specify it this way: python $ERANGEPATH/findall.py PEAK unique.rds unique.region.txt -control myControl.rds -listPeak -revbackground -cache 1000000 I hope this helps! Ali |
Hi Alim,
Thanks a lot for your reply....:):):) It is really helpful for me to know about the importance of cache in RDS files and the way to handle it. Is this the RAM only that makes the findall.py execution very fast ? Or we can think of some other way of making it fast.:rolleyes::rolleyes: I want to know about the procedure used in ERANGE for Chip-seq analysis, Can you suggest some documentation to understand the algorithm used in ERANGE for this purpose ? I am new to this field:confused::confused:, any reply in this regard will be highly appreciated. With lot of thanks!!!!!:):):) ~Vix |
Hi Vix,
Honestly, compared to the RNA-seq pipeline, the ChIP-seq pipeline is pretty fast (but maybe I'm not objective about this!) The basis of the original algorithm is still as described in the NRSF ChIP-seq Science paper from 2007. Just look for pileups of reads using a greedy algorithm, and check that they are enriched compared to the same region in the control. It's fundamentally a region caller rather than a "summit"-caller. However, many of the details have changed & are continuing to change. As I come across increasingly more datasets, I've introduced various parameters to filter out false positive peaks. As I finally gave in and started reporting a summit for peaks, I've come across datasets that have shifts so large that they require an explicit shift. Version 3.1 of ERANGE will introduce that any day now. One thing I will say about the ChIP-seq capabilities of ERANGE is that calling regions and summits is a beginning, not an end. The other scripts in the package (which depend on Cistematic) are actually designed to find motifs in the regions, find the genes associated with the regions & do a GO analysis of these genes for example. Ali |
ERANGE error message
Hi all-
I've used successfully used ERANGE3.0.1 in the past for some RNA-seq analysis. I'm now running into some problems getting through the RunStandardAnalysis script. My reads are Bowtie-aligned (single-end) and built into an appropriate RDS file. I'm working with the mouse genome. When running RunStandardAnalysis.sh, the first few steps (geneMrnacounts.py, normalizeExpandedExonic.py) go without any problems. However, geneMrnaCountsWeighted.py starts off fine but then starts pouring out errors, as shown below: /proj/genome/commoncode3.0.1/geneMrnaCountsWeighted.py: version 3.7 dataset Sample.bowtie.rds metadata: bowtie_mapped True dataType RNA genome mm9 rdsVersion 1.1 readsize 36 9756706 unique reads, 1097314 spliced reads and 3706354 multireads default cache size is 2000000 pages found index 1 read 100000 read 200000 read 300000 read 400000 read 500000 10 read 600000 read 700000 read 800000 read 900000 read 1000000 11 read 1100000 read 1200000 read 1300000 read 1400000 read 1500000 read 1600000 read 1700000 read 1800000 read 1900000 12 read 2000000 read 2100000 read 2200000 read 2300000 read 2400000 13 read 2500000 read 2600000 14 read 2700000 read 2800000 read 2900000 Python(3391,0xa02bb720) malloc: *** mmap(size=100663296) failed (error code=12) *** error: can't allocate region *** set a breakpoint in malloc_error_break to debug 15 read 3000000 read 3100000 read 3200000 read 3300000 gid 12419 not in gidReadDict Python(3391,0xa02bb720) malloc: *** mmap(size=100663296) failed (error code=12) *** error: can't allocate region *** set a breakpoint in malloc_error_break to debug gid 12419 not in gidReadDict Python(3391,0xa02bb720) malloc: *** mmap(size=100663296) failed (error code=12) *** error: can't allocate region *** set a breakpoint in malloc_error_break to debug gid 12419 not in gidReadDict Python(3391,0xa02bb720) malloc: *** mmap(size=100663296) failed (error code=12) *** error: can't allocate region *** set a breakpoint in malloc_error_break to debug gid 12419 not in gidReadDict Python(3391,0xa02bb720) malloc: *** mmap(size=100663296) failed (error code=12) *** error: can't allocate region *** set a breakpoint in malloc_error_break to debug gid 12419 not in gidReadDict Python(3391,0xa02bb720) malloc: *** mmap(size=100663296) failed (error code=12) *** error: can't allocate region *** set a breakpoint in malloc_error_break to debug gid 12419 not in gidReadDict Python(3391,0xa02bb720) malloc: *** mmap(size=100663296) failed (error code=12) *** error: can't allocate region *** set a breakpoint in malloc_error_break to debug gid 12419 not in gidReadDict These go on for pages and pages, with different values for "gid" as it goes on. Has anyone seen this problem? I can't figure out what I've done differently compared to my past successful runs. Any assistance would be MUCH appreciated. Thanks! |
Hi,
So I would take the error message to heart that malloc (the unix memory allocation call) is failing - which implies that you are running out memory. Could you (or someone else) be using up significant amounts of memory at the same time as you are running ERANGE ? Or are you running this on another machine with less RAM ? By the way, I highly recommend upgrading to ERANGE 3.1 to pick up some of the other bugs I have fixed over the last 3 months! You won't have to rebuild the RDS files or anything. Ali |
Thanks Ali -
I'm running ERANGE locally on a MacBook Pro with 4gb RAM, and not running ANYTHING else during the analysis. I've got the cache set to 2000000 for the RDS file. This has been more than enough memory in the past...though these are more reads than i've tried previously. I'll try the upgrade as well. Thanks! |
4 Gb RAM should be enough for the amount of reads you have.... especially if you have enough virtual memory (could you also be going low on disk space ?)
If it won't work with this much RAM, then simply drop the cache size down to a smaller value (e.g. 1000000), which will free up some extra RAM. Ali |
Some queries for chip-seq analysis
Hi Alim,
Thanks for your previous help regarding chip-seq analysis. I have few more queries for the same: :confused:Can you tell me how does the RAM requirement varies with number of reads(or data size), while doing Chip-seq analysis ? :confused:Does it increases continuously or gets some saturation in terms of RAM, I mean further increase in RAM will not help in processing the reads. :confused:How can I view the content of rds files ? :confused:Also can you suggest some links to get various sizes of sample data, that can be used for chip-seq analysis ? Looking for your reply. With THANKS!!:):):) Vix |
Hey Ali-
Just wanted to say thanks for your help. Reducing the cache size seemed to help get through the analysis without memory errors, despite taking a bit longer. I've got some additional datasets with way more reads that unfortunately can't be handled on the 4gb of RAM. Looks like I'm going to need to find a bigger box. Thanks again for providing so much support. |
Hello,
If I can resurrect this thread briefly, I've been having a little trouble with ERANGE 3.1. When I try to either annotate the genes or use the getfasta.py script to get the fasta sequence for meme, I always and up with a blank file. The annotation script gives no errors, and the getfasta.py script only says there is a problem for each peak. Everything is "installed" properly, and my $PATH variable is set, so that isn't the issue. In looking through the code, I see a call to "Genome', but I haven't been able to locate that anywhere in the code. This might be an issue more with cistematic than with ERANGE, or it's just me missing the obvious fix. Either way, any suggestions would be wonderful. I've already written Perl scripts for some of this, but I'd like to save time and use the built-in scripts if possible. Thanks so much! Daniel EDIT: It was a result of a difference in the input file. The eland files I was using were slightly different than normal; everything works as it should now! |
ERANGE problem
Hello Everyone,
I am trying to run ERANGE 3.1 (http://woldlab.caltech.edu/rnaseq/) with my own ChIP-Seq dataset. I have eland_extended files for lane 1 & 2. I convert them serially to RDS using makerdsfromeland2.py script. Then I run ERANGE with the following command. python ./findall.py EXP2vsEXP1 EXP2.rds ./output/EXP2vsEXP1.txt -control EXP1.rds -listPeak -revbackground -ratio 3 -nodirectionality It runs for about 10 minutes and finally succeeds. The EXP2vsEXP1.txt file has the following information. #enriched sample: EXP2.rds (7.8 M reads) #control sample: EXP1.rds (7.1 M reads) #enforceDirectionality=False listPeak=True nomulti=False cache=False #spacing<50 minimum>4.0 ratio>3.0 minPeak=0.5 trimmed=10% #minPlus=0.25 maxPlus=0.75 leftPlus=0.40 shift=0 pvalue=back #regionID chrom start stop RPM fold multi% peakPos peakHeight pValue EXP2vsEXP11 ref_chr9 83452778 83452862 7.3 3.1 57.3 83452819 7.0 2.5e-17 #stats: 7.3 RPM in 1 regions #8 regions (37.6 RPM) found in background (FDR = 100.00 percent) But i am pretty sure that there are more than one peak with this data set (from the peak files I received from the contractor company). Could someone help me where exactly am I doing wrong. Regards, PJ |
@!$#@%@#%#^%#@@^@
|
empty splicefa file
does anyone happen to know why would i end up with an empty file after running the "getsplicefa.py" script? I have the knowngene.txt as well as the correct length - 4! but still no results are made.
Thanks in advance. |
Hello,
I have a question related to "cistematic" for RNA-seq analysis. The reads were aligned using bowtie. After that I built the RDS file using makerdsfrombowtie.py. Then I tried running geneMrnaCounts.py but received an error like below. #Run script python geneMrnaCounts.py Mouse mm9.mybowtie.rds mm9.mybowtie.uniqs.count -markNM #Error Traceback (most recent call last): File "geneMrnaCounts.py", line 16, in <module> from cistematic.genomes import Genome File "cistematic/cistematic/genomes/__init__.py", line 907, in <module> geneDB[genome] = eval(genome+'.geneDB') File "<string>", line 1, in <module> AttributeError: 'module' object has no attribute 'geneDB' Does anyone have any insight? Thank you very much for your help! Best regards, C C |
All times are GMT -8. The time now is 08:37 AM. |
Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2022, vBulletin Solutions, Inc.