SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
ERANGE installation vinay052003 Bioinformatics 0 11-29-2011 08:31 PM
Bowtie and ERANGE Bio.X2Y Bioinformatics 5 10-29-2010 07:41 AM
ERANGE - getsplicefa.py sylee RNA Sequencing 6 07-22-2010 08:50 AM
problem in using ERANGE saha RNA Sequencing 0 03-03-2010 11:31 PM
ERANGE problems! gthomas RNA Sequencing 1 01-12-2010 04:30 AM

Reply
 
Thread Tools
Old 12-10-2008, 05:05 AM   #1
MaloneJH
Junior Member
 
Location: Bethesda, MD

Join Date: Aug 2008
Posts: 2
Default Erange

Dear all,

I'm trying to get the Ali Mortazavi and Wold lab software package up and running for analyzing mRNA-Seq data. Its been a struggle and wondering if anyone has some tips for getting this going.

Many thanks in advance.

John
MaloneJH is offline   Reply With Quote
Old 12-12-2008, 04:49 AM   #2
florian
Junior Member
 
Location: Edinburgh, UK

Join Date: Nov 2008
Posts: 7
Default

was struggling with getting it to run recently as well, but i think i finally got it running now. at which part of the installation do you get stuck?
florian is offline   Reply With Quote
Old 02-20-2009, 04:49 PM   #3
griffon42
Member
 
Location: New York

Join Date: Jan 2009
Posts: 23
Default Also struggling with ERANGE...

Not sure if this thread is still going, but i'll give it a shot.

I'm also trying to get ERANGE (v2.1) off the ground and having some major problems.

I'm running through the shell on Mac OS X 10.5.6 with Python 2.5.1. I installed all of the necessary prereqs, including Cistematic as per the instructions on the Wold lab site.

For starters, I've been trying to test ERANGE on the Wold Liver sample dataset. I get stuck right at the beginning:

python geneMrnacounts.py mouse proj/genome/SAMPLEDATA/mm9Liver1.uniqs.bed mm9Liver1.uniqs.count mm9Liver1.nomatch.bed
geneMrnacounts.py: version 3.3
Traceback (most recent call last):
File "geneMrnacounts.py", line 16, in <module>
from cistematic.genomes import Genome
ImportError: No module named cistematic.genomes
..$ python
Python 2.5.1 (r251:54863, Jul 23 2008, 11:00:16)
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>

It seems to be missing this "cistematic.genomes" module if I try running other scripts as well. I've had some very talented python folks take a look and they're stumped as well.

Any advice on this? Has anyone run into similar problems?

Any help would be VERY MUCH appreciated. THANKS!
griffon42 is offline   Reply With Quote
Old 02-23-2009, 01:57 PM   #4
alim
Member
 
Location: pasadena, ca

Join Date: Jun 2008
Posts: 14
Default Re: Also struggling with ERANGE...

Hi,

The error message is about not finding cistematic (cistematic.caltech.edu), which is a required package to use ERANGE for RNA-seq !

If you've already downloaded cistematic (and the appropriate genome directories e.g. H_sapiens or M_Musculus) and saved them into a directory such as /my/favorite/dir, then simply set (assuming bash syntax)

export PYTHONPATH=/my/favorite/dir
export CISTEMATIC_ROOT=/my/favorite/dir

I hope this helps!

Ali
alim is offline   Reply With Quote
Old 03-17-2009, 02:28 AM   #5
VIX_Z
Member
 
Location: INDIA

Join Date: Mar 2009
Posts: 11
Default

Hi ALL,

I want to do some exercise of chip-seq and chip-chip analysis. For this analysis I want to use WOLD lab's ERANGE. But, I am not finding the way to use it properly. If anybody has tried the ERANGE. before. please advice me the away to start with it.

Thanks in advance,

~Vivek
VIX_Z is offline   Reply With Quote
Old 03-30-2009, 10:46 PM   #6
VIX_Z
Member
 
Location: INDIA

Join Date: Mar 2009
Posts: 11
Question

Hi All,
I am new to chip-seq analysis and ERANGE. I am trying to run findall.py script as
"python /ERANGEPATH/findall.py PEAK unique.rds unique.region.txt -listPeak -revbackground"
where unique.rds is input rds file and unique.region.txt is the output region file.
script is running for very long(20+ hrs) and then exiting with an error as
------------------------------------------
Traceback (most recent call last):
File "/root/Desktop/genotypic/commoncode/findall.py", line 397, in <module>
hitDict = mockRDS.getReadsDict(fullChrom=True, chrom=achrom, withWeight=True, doMulti=useMulti, findallOptimize=True)
NameError: name 'mockRDS' is not defined
____________________________________-

Does anybody have an experience with the similar error ?
It will be great if anybody can suggest me the way to get rid of this error, and other precaution to run the script without error.

With Thanks,
Vivek
VIX_Z is offline   Reply With Quote
Old 04-14-2009, 09:18 AM   #7
alim
Member
 
Location: pasadena, ca

Join Date: Jun 2008
Posts: 14
Default ERANGE performance, etc...

I can see how my rather sparse documentation could lead people astray.

Performance-wise, you need to make sure that you have 3 things under control:
1) You need to allocate as much cache as possible to your rds file as possible. This is a sqlite parameter that needs to be set once per file, but can be overidden in most of the script. It should be at about 2/3 the max amount of RAM that you want to use. If you have 2-4 Gb, a value of 1 million would be appropriate. You can set it up with the following command:

python $ERANGEPATH/rdsmetadata.py myfavorite.rds -defaultcache 1000000

2) You need to make sure that your RDS file is indexed (if findall.py tells you that the file is not indexed, just control-C and fix it). If you forgot to do so when loading the last lane, you can force it with the following command:

python $ERANGEPATH/rdsmetadata.py myfavorite.rds -index

you could have combined 1 & 2 in one command, i.e.

python $ERANGEPATH/rdsmetadata.py myfavorite.rds -index -defaultcache 1000000

3) sqlite can be *unbearably* slow over NFS. If you cannot store the RDS file on a local drive, then you need to force local caching. For ChIP-seq, it means explicitly giving the "-cache someValue" option to trigger copying to a local temp drive (/tmp by default, but it can be redirected to anywhere pointed to by the environmental variable CISTEMATIC_TEMP). If someValue is below the defaultcache size, it will ignore the value but still copy locally.

For RNA-seq, you really should do #1, #2, and use the shell script runStandardAnalysisNFS.sh (or at least use the command line arguments in there)

Fyi, if everything is optimal, findall.py for ChIP-seq should be done within 30 minutes max, and an RNA-seq analysis should take from a couple of hours to overnight, depending on the size of the dataset.

For vix_z, you should run findall.py without the -revbackground option, since you never did specify a control file, e.g.

python $ERANGEPATH/findall.py PEAK unique.rds unique.region.txt -listPeak -cache 1000000

if you have a background (aka "control") library, then you can specify it this way:

python $ERANGEPATH/findall.py PEAK unique.rds unique.region.txt -control myControl.rds -listPeak -revbackground -cache 1000000

I hope this helps!

Ali
alim is offline   Reply With Quote
Old 04-14-2009, 10:09 PM   #8
VIX_Z
Member
 
Location: INDIA

Join Date: Mar 2009
Posts: 11
Default

Hi Alim,

Thanks a lot for your reply....
It is really helpful for me to know about the importance of cache in RDS files and the way to handle it.
Is this the RAM only that makes the findall.py execution very fast ?
Or we can think of some other way of making it fast.

I want to know about the procedure used in ERANGE for Chip-seq analysis, Can you suggest some documentation to understand the algorithm used in ERANGE for this purpose ?

I am new to this field, any reply in this regard will be highly appreciated.

With lot of thanks!!!!!

~Vix
VIX_Z is offline   Reply With Quote
Old 04-16-2009, 03:36 PM   #9
alim
Member
 
Location: pasadena, ca

Join Date: Jun 2008
Posts: 14
Default

Hi Vix,

Honestly, compared to the RNA-seq pipeline, the ChIP-seq pipeline is pretty fast (but maybe I'm not objective about this!)

The basis of the original algorithm is still as described in the NRSF ChIP-seq Science paper from 2007. Just look for pileups of reads using a greedy algorithm, and check that they are enriched compared to the same region in the control. It's fundamentally a region caller rather than a "summit"-caller.

However, many of the details have changed & are continuing to change. As I come across increasingly more datasets, I've introduced various parameters to filter out false positive peaks. As I finally gave in and started reporting a summit for peaks, I've come across datasets that have shifts so large that they require an explicit shift. Version 3.1 of ERANGE will introduce that any day now.

One thing I will say about the ChIP-seq capabilities of ERANGE is that calling regions and summits is a beginning, not an end. The other scripts in the package (which depend on Cistematic) are actually designed to find motifs in the regions, find the genes associated with the regions & do a GO analysis of these genes for example.

Ali
alim is offline   Reply With Quote
Old 04-24-2009, 10:24 AM   #10
griffon42
Member
 
Location: New York

Join Date: Jan 2009
Posts: 23
Default ERANGE error message

Hi all-

I've used successfully used ERANGE3.0.1 in the past for some RNA-seq analysis. I'm now running into some problems getting through the RunStandardAnalysis script.

My reads are Bowtie-aligned (single-end) and built into an appropriate RDS file. I'm working with the mouse genome.

When running RunStandardAnalysis.sh, the first few steps (geneMrnacounts.py, normalizeExpandedExonic.py) go without any problems. However, geneMrnaCountsWeighted.py starts off fine but then starts pouring out errors, as shown below:


/proj/genome/commoncode3.0.1/geneMrnaCountsWeighted.py: version 3.7
dataset Sample.bowtie.rds
metadata:
bowtie_mapped True
dataType RNA
genome mm9
rdsVersion 1.1
readsize 36

9756706 unique reads, 1097314 spliced reads and 3706354 multireads
default cache size is 2000000 pages
found index

1 read 100000 read 200000 read 300000 read 400000 read 500000
10 read 600000 read 700000 read 800000 read 900000 read 1000000
11 read 1100000 read 1200000 read 1300000 read 1400000 read 1500000 read 1600000 read 1700000 read 1800000 read 1900000
12 read 2000000 read 2100000 read 2200000 read 2300000 read 2400000
13 read 2500000 read 2600000
14 read 2700000 read 2800000 read 2900000
Python(3391,0xa02bb720) malloc: *** mmap(size=100663296) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
15 read 3000000 read 3100000 read 3200000 read 3300000 gid 12419 not in gidReadDict
Python(3391,0xa02bb720) malloc: *** mmap(size=100663296) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
gid 12419 not in gidReadDict
Python(3391,0xa02bb720) malloc: *** mmap(size=100663296) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
gid 12419 not in gidReadDict
Python(3391,0xa02bb720) malloc: *** mmap(size=100663296) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
gid 12419 not in gidReadDict
Python(3391,0xa02bb720) malloc: *** mmap(size=100663296) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
gid 12419 not in gidReadDict
Python(3391,0xa02bb720) malloc: *** mmap(size=100663296) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
gid 12419 not in gidReadDict
Python(3391,0xa02bb720) malloc: *** mmap(size=100663296) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
gid 12419 not in gidReadDict


These go on for pages and pages, with different values for "gid" as it goes on.

Has anyone seen this problem? I can't figure out what I've done differently compared to my past successful runs. Any assistance would be MUCH appreciated.

Thanks!
griffon42 is offline   Reply With Quote
Old 04-24-2009, 10:36 AM   #11
alim
Member
 
Location: pasadena, ca

Join Date: Jun 2008
Posts: 14
Default

Hi,

So I would take the error message to heart that malloc (the unix memory allocation call) is failing - which implies that you are running out memory. Could you (or someone else) be using up significant amounts of memory at the same time as you are running ERANGE ? Or are you running this on another machine with less RAM ?

By the way, I highly recommend upgrading to ERANGE 3.1 to pick up some of the other bugs I have fixed over the last 3 months! You won't have to rebuild the RDS files or anything.

Ali
alim is offline   Reply With Quote
Old 04-24-2009, 12:28 PM   #12
griffon42
Member
 
Location: New York

Join Date: Jan 2009
Posts: 23
Default

Thanks Ali -

I'm running ERANGE locally on a MacBook Pro with 4gb RAM, and not running ANYTHING else during the analysis. I've got the cache set to 2000000 for the RDS file.

This has been more than enough memory in the past...though these are more reads than i've tried previously.

I'll try the upgrade as well.

Thanks!
griffon42 is offline   Reply With Quote
Old 04-24-2009, 01:36 PM   #13
alim
Member
 
Location: pasadena, ca

Join Date: Jun 2008
Posts: 14
Default

4 Gb RAM should be enough for the amount of reads you have.... especially if you have enough virtual memory (could you also be going low on disk space ?)

If it won't work with this much RAM, then simply drop the cache size down to a smaller value (e.g. 1000000), which will free up some extra RAM.

Ali
alim is offline   Reply With Quote
Old 04-26-2009, 10:24 PM   #14
VIX_Z
Member
 
Location: INDIA

Join Date: Mar 2009
Posts: 11
Question Some queries for chip-seq analysis

Hi Alim,

Thanks for your previous help regarding chip-seq analysis.
I have few more queries for the same:
Can you tell me how does the RAM requirement varies with number of reads(or data size), while doing Chip-seq analysis ?
Does it increases continuously or gets some saturation in terms of RAM, I mean further increase in RAM will not help in processing the reads.
How can I view the content of rds files ?
Also can you suggest some links to get various sizes of sample data, that can be used for chip-seq analysis ?

Looking for your reply.
With THANKS!!
Vix
VIX_Z is offline   Reply With Quote
Old 04-30-2009, 01:11 PM   #15
griffon42
Member
 
Location: New York

Join Date: Jan 2009
Posts: 23
Default

Hey Ali-

Just wanted to say thanks for your help. Reducing the cache size seemed to help get through the analysis without memory errors, despite taking a bit longer.

I've got some additional datasets with way more reads that unfortunately can't be handled on the 4gb of RAM. Looks like I'm going to need to find a bigger box.

Thanks again for providing so much support.
griffon42 is offline   Reply With Quote
Old 06-08-2009, 01:48 PM   #16
dnewkirk
Junior Member
 
Location: Los Angeles

Join Date: Mar 2009
Posts: 8
Default

Hello,
If I can resurrect this thread briefly, I've been having a little trouble with ERANGE 3.1. When I try to either annotate the genes or use the getfasta.py script to get the fasta sequence for meme, I always and up with a blank file. The annotation script gives no errors, and the getfasta.py script only says there is a problem for each peak. Everything is "installed" properly, and my $PATH variable is set, so that isn't the issue. In looking through the code, I see a call to "Genome', but I haven't been able to locate that anywhere in the code. This might be an issue more with cistematic than with ERANGE, or it's just me missing the obvious fix. Either way, any suggestions would be wonderful. I've already written Perl scripts for some of this, but I'd like to save time and use the built-in scripts if possible. Thanks so much!

Daniel

EDIT: It was a result of a difference in the input file. The eland files I was using were slightly different than normal; everything works as it should now!

Last edited by dnewkirk; 06-10-2009 at 12:24 PM.
dnewkirk is offline   Reply With Quote
Old 09-15-2009, 02:40 PM   #17
joshibros
Junior Member
 
Location: Storrs, CT

Join Date: Sep 2009
Posts: 1
Default ERANGE problem

Hello Everyone,

I am trying to run ERANGE 3.1 (http://woldlab.caltech.edu/rnaseq/) with my own ChIP-Seq dataset. I have eland_extended files for lane 1 & 2. I convert them serially to RDS using makerdsfromeland2.py script. Then I run ERANGE with the following command.

python ./findall.py EXP2vsEXP1 EXP2.rds ./output/EXP2vsEXP1.txt -control EXP1.rds -listPeak -revbackground -ratio 3 -nodirectionality

It runs for about 10 minutes and finally succeeds. The EXP2vsEXP1.txt file has the following information.

#enriched sample: EXP2.rds (7.8 M reads)
#control sample: EXP1.rds (7.1 M reads)
#enforceDirectionality=False listPeak=True nomulti=False cache=False
#spacing<50 minimum>4.0 ratio>3.0 minPeak=0.5 trimmed=10%
#minPlus=0.25 maxPlus=0.75 leftPlus=0.40 shift=0 pvalue=back
#regionID chrom start stop RPM fold multi% peakPos peakHeight pValue
EXP2vsEXP11 ref_chr9 83452778 83452862 7.3 3.1 57.3 83452819 7.0 2.5e-17
#stats: 7.3 RPM in 1 regions
#8 regions (37.6 RPM) found in background (FDR = 100.00 percent)


But i am pretty sure that there are more than one peak with this data set (from the peak files I received from the contractor company). Could someone help me where exactly am I doing wrong.

Regards,
PJ

Last edited by joshibros; 09-18-2009 at 08:49 AM. Reason: Naming Issues
joshibros is offline   Reply With Quote
Old 11-09-2009, 07:28 AM   #18
mcknowable
Junior Member
 
Location: westcoast

Join Date: Nov 2009
Posts: 2
Default

@!$#@%@#%#^%#@@^@

Last edited by mcknowable; 11-11-2009 at 06:04 AM. Reason: delete
mcknowable is offline   Reply With Quote
Old 11-16-2009, 04:41 AM   #19
mcknowable
Junior Member
 
Location: westcoast

Join Date: Nov 2009
Posts: 2
Default empty splicefa file

does anyone happen to know why would i end up with an empty file after running the "getsplicefa.py" script? I have the knowngene.txt as well as the correct length - 4! but still no results are made.

Thanks in advance.
mcknowable is offline   Reply With Quote
Old 12-17-2009, 01:44 PM   #20
ww236c
Junior Member
 
Location: USA

Join Date: Dec 2009
Posts: 1
Default

Hello,

I have a question related to "cistematic" for RNA-seq analysis. The reads were aligned using bowtie. After that I built the RDS file using makerdsfrombowtie.py. Then I tried running geneMrnaCounts.py but received an error like below.

#Run script
python geneMrnaCounts.py Mouse mm9.mybowtie.rds mm9.mybowtie.uniqs.count -markNM

#Error
Traceback (most recent call last):
File "geneMrnaCounts.py", line 16, in <module>
from cistematic.genomes import Genome
File "cistematic/cistematic/genomes/__init__.py", line 907, in <module>
geneDB[genome] = eval(genome+'.geneDB')
File "<string>", line 1, in <module>
AttributeError: 'module' object has no attribute 'geneDB'

Does anyone have any insight? Thank you very much for your help!
Best regards,
C C

Last edited by ww236c; 12-17-2009 at 01:47 PM. Reason: about Cistematic
ww236c is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:38 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2022, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO