referring to http://woldlab.caltech.edu/erange/README.build-rds,
0. is "http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/knownGene.txt.gz" under "http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/" the right table to input to getsplicefa.py for hg19?
1. it is mentioned "Download the chromosomes from UCSC", what is the exact right file to download?
1.1 is it fine for me to use bowtie-inspect to "output a FASTA file containing the sequences of the original references (with all non-A/C/G/T characters converted to Ns)"( http://bowtie-bio.sourceforge.net/ma...ndex-inspector)
from the pre-built index @ ftp://ftp.cbcb.umd.edu/pub/data/bowt.../hg19.ebwt.zip under http://bowtie-bio.sourceforge.net/index.shtml, and then take the result as "the chromosomes from UCSC
" to be input to getsplicefa.py??
1.2 http://woldlab.caltech.edu/erange/RN...lysisSteps.txt has an sample command line: "python $ERANGEPATH/getsplicefa.py hsapiens /my/path/to/human/knownGene.txt hg18splice32.fa 28"
However, I tried it but failed: "
xu@linux18> python getsplicefa.py Human.txt knownGene.txt expandedgenome_spacer2maxBorder27.fa 27
psyco not running
getsplicefa.py: version 3.5
72402
655896
10002 genes
20003 genes
30004 genes
40005 genes
50006 genes
60007 genes
70008 genes
3624 splices too short to be seen
555 splices will be under-reported"
it seems Human.txt (the result of reverse engineering on ftp://ftp.cbcb.umd.edu/pub/data/bowt.../hg19.ebwt.zip mentioned above ) could not be identified by getsplicefa.py at all.
1.3 And I am totally lost the way on setting H_sapiens(with a set of chromo1.bin~chromo22.bin and chromoX/Y.bin file and hsapiens.genedb) to be aware by cistematic core and ERANGE's getsplicefa
2. Where should the downloaded "chromosomes from UCSC" be located compared with the scripts under ../ERANGE3.2.1/commoncode?
3. "http://woldlab.caltech.edu/erange/README.rna-seq" mentions "build expanded genomes with splices and spikes"
there is a "mm9splices_spikes.tgz (the files for building the expanded genomes and remapping splices) ", then my question is
.what is the relationship of mm9splices_spikes.tgz with knownGene.txt?
. Is such a file different from knownGene.txt necessary to build the expanded genome of hg19? if so, how can it be available?
4.for Cistematic 3.0, "You will need to download the following packages: * cistematic3.0.tgz * db2.0.tgz"
However, where should the files in db2.0.tgz be put? .../cistematic3.0/db, the folder in cistematic3.0.tgz ?
5. there is many puzzles mentioned @ http://seqanswers.com/ on how to set the CISTEMATIC_ROOT, ERANGEPATH, PYTHONPATH, CISTEMATIC_TEMP.
Would you please issue a working solution by a big while detailed picture of your varibale/path setting and the organization of the files of ERANGE, CISTEMATIC, "chromosomes from UCSC" and knownGene.txt?
6. My RNA-seq's reads' length is diverse(from 13nt to 31 nt), how should I set the spacer and maxBorder in the scenario of hg 19 and varying (13nt-31nt) reads length?
7. Once the expanded genome is ready from getsplicefa.py, may I immediately use ./bowtie-build to generate the index and then map with ./Bowtie ?
8. how should such a spliced mapping results be fed to a peak finder? I've no experience with the later workflow of spliced mapping before.
Sorry about so many questions, but it seems they are common questions shared by many green hand of ERANGE, please issue me some guidance if feasible
Best,
jie
0. is "http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/knownGene.txt.gz" under "http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/" the right table to input to getsplicefa.py for hg19?
1. it is mentioned "Download the chromosomes from UCSC", what is the exact right file to download?
1.1 is it fine for me to use bowtie-inspect to "output a FASTA file containing the sequences of the original references (with all non-A/C/G/T characters converted to Ns)"( http://bowtie-bio.sourceforge.net/ma...ndex-inspector)
from the pre-built index @ ftp://ftp.cbcb.umd.edu/pub/data/bowt.../hg19.ebwt.zip under http://bowtie-bio.sourceforge.net/index.shtml, and then take the result as "the chromosomes from UCSC
" to be input to getsplicefa.py??
1.2 http://woldlab.caltech.edu/erange/RN...lysisSteps.txt has an sample command line: "python $ERANGEPATH/getsplicefa.py hsapiens /my/path/to/human/knownGene.txt hg18splice32.fa 28"
However, I tried it but failed: "
xu@linux18> python getsplicefa.py Human.txt knownGene.txt expandedgenome_spacer2maxBorder27.fa 27
psyco not running
getsplicefa.py: version 3.5
72402
655896
10002 genes
20003 genes
30004 genes
40005 genes
50006 genes
60007 genes
70008 genes
3624 splices too short to be seen
555 splices will be under-reported"
it seems Human.txt (the result of reverse engineering on ftp://ftp.cbcb.umd.edu/pub/data/bowt.../hg19.ebwt.zip mentioned above ) could not be identified by getsplicefa.py at all.
1.3 And I am totally lost the way on setting H_sapiens(with a set of chromo1.bin~chromo22.bin and chromoX/Y.bin file and hsapiens.genedb) to be aware by cistematic core and ERANGE's getsplicefa
2. Where should the downloaded "chromosomes from UCSC" be located compared with the scripts under ../ERANGE3.2.1/commoncode?
3. "http://woldlab.caltech.edu/erange/README.rna-seq" mentions "build expanded genomes with splices and spikes"
there is a "mm9splices_spikes.tgz (the files for building the expanded genomes and remapping splices) ", then my question is
.what is the relationship of mm9splices_spikes.tgz with knownGene.txt?
. Is such a file different from knownGene.txt necessary to build the expanded genome of hg19? if so, how can it be available?
4.for Cistematic 3.0, "You will need to download the following packages: * cistematic3.0.tgz * db2.0.tgz"
However, where should the files in db2.0.tgz be put? .../cistematic3.0/db, the folder in cistematic3.0.tgz ?
5. there is many puzzles mentioned @ http://seqanswers.com/ on how to set the CISTEMATIC_ROOT, ERANGEPATH, PYTHONPATH, CISTEMATIC_TEMP.
Would you please issue a working solution by a big while detailed picture of your varibale/path setting and the organization of the files of ERANGE, CISTEMATIC, "chromosomes from UCSC" and knownGene.txt?
6. My RNA-seq's reads' length is diverse(from 13nt to 31 nt), how should I set the spacer and maxBorder in the scenario of hg 19 and varying (13nt-31nt) reads length?
7. Once the expanded genome is ready from getsplicefa.py, may I immediately use ./bowtie-build to generate the index and then map with ./Bowtie ?
8. how should such a spliced mapping results be fed to a peak finder? I've no experience with the later workflow of spliced mapping before.
Sorry about so many questions, but it seems they are common questions shared by many green hand of ERANGE, please issue me some guidance if feasible
Best,
jie