SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics
Similar Threads
Thread Thread Starter Forum Replies Last Post
Anyone,who knows"1 million human genomes resequencing project"? Gone General 8 10-18-2011 11:00 PM
The position file formats ".clocs" and "_pos.txt"? Ist there any difference? elgor Illumina/Solexa 0 06-27-2011 07:55 AM
Why so many unaligned reads inspite of "similarity" of any 2 human genomes? nadir Bioinformatics 0 04-06-2011 03:15 AM
"Systems biology and administration" & "Genome generation: no engineering allowed" seb567 Bioinformatics 0 05-25-2010 12:19 PM
SEQanswers second "publication": "How to map billions of short reads onto genomes" ECO Literature Watch 0 06-29-2009 11:49 PM

Reply
 
Thread Tools
Old 10-26-2010, 10:13 AM   #1
jiexu
Junior Member
 
Location: NYC

Join Date: Oct 2010
Posts: 3
Default Asking for help on "BUILDING EXPANDED GENOMES" by Caltech ERANGE

referring to http://woldlab.caltech.edu/erange/README.build-rds,

0. is "http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/knownGene.txt.gz" under "http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/" the right table to input to getsplicefa.py for hg19?

1. it is mentioned "Download the chromosomes from UCSC", what is the exact right file to download?
1.1 is it fine for me to use bowtie-inspect to "output a FASTA file containing the sequences of the original references (with all non-A/C/G/T characters converted to Ns)"( http://bowtie-bio.sourceforge.net/ma...ndex-inspector)
from the pre-built index @ ftp://ftp.cbcb.umd.edu/pub/data/bowt.../hg19.ebwt.zip under http://bowtie-bio.sourceforge.net/index.shtml, and then take the result as "the chromosomes from UCSC
" to be input to getsplicefa.py??

1.2 http://woldlab.caltech.edu/erange/RN...lysisSteps.txt has an sample command line: "python $ERANGEPATH/getsplicefa.py hsapiens /my/path/to/human/knownGene.txt hg18splice32.fa 28"

However, I tried it but failed: "
xu@linux18> python getsplicefa.py Human.txt knownGene.txt expandedgenome_spacer2maxBorder27.fa 27
psyco not running
getsplicefa.py: version 3.5
72402
655896
10002 genes
20003 genes
30004 genes
40005 genes
50006 genes
60007 genes
70008 genes
3624 splices too short to be seen
555 splices will be under-reported"

it seems Human.txt (the result of reverse engineering on ftp://ftp.cbcb.umd.edu/pub/data/bowt.../hg19.ebwt.zip mentioned above ) could not be identified by getsplicefa.py at all.

1.3 And I am totally lost the way on setting H_sapiens(with a set of chromo1.bin~chromo22.bin and chromoX/Y.bin file and hsapiens.genedb) to be aware by cistematic core and ERANGE's getsplicefa


2. Where should the downloaded "chromosomes from UCSC" be located compared with the scripts under ../ERANGE3.2.1/commoncode?

3. "http://woldlab.caltech.edu/erange/README.rna-seq" mentions "build expanded genomes with splices and spikes"
there is a "mm9splices_spikes.tgz (the files for building the expanded genomes and remapping splices) ", then my question is
.what is the relationship of mm9splices_spikes.tgz with knownGene.txt?
. Is such a file different from knownGene.txt necessary to build the expanded genome of hg19? if so, how can it be available?

4.for Cistematic 3.0, "You will need to download the following packages: * cistematic3.0.tgz * db2.0.tgz"
However, where should the files in db2.0.tgz be put? .../cistematic3.0/db, the folder in cistematic3.0.tgz ?

5. there is many puzzles mentioned @ http://seqanswers.com/ on how to set the CISTEMATIC_ROOT, ERANGEPATH, PYTHONPATH, CISTEMATIC_TEMP.
Would you please issue a working solution by a big while detailed picture of your varibale/path setting and the organization of the files of ERANGE, CISTEMATIC, "chromosomes from UCSC" and knownGene.txt?

6. My RNA-seq's reads' length is diverse(from 13nt to 31 nt), how should I set the spacer and maxBorder in the scenario of hg 19 and varying (13nt-31nt) reads length?

7. Once the expanded genome is ready from getsplicefa.py, may I immediately use ./bowtie-build to generate the index and then map with ./Bowtie ?

8. how should such a spliced mapping results be fed to a peak finder? I've no experience with the later workflow of spliced mapping before.

Sorry about so many questions, but it seems they are common questions shared by many green hand of ERANGE, please issue me some guidance if feasible

Best,
jie
jiexu is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



All times are GMT -8. The time now is 12:31 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2022, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO