Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Asking for help on "BUILDING EXPANDED GENOMES" by Caltech ERANGE

    referring to http://woldlab.caltech.edu/erange/README.build-rds,

    0. is "http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/knownGene.txt.gz" under "http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/" the right table to input to getsplicefa.py for hg19?

    1. it is mentioned "Download the chromosomes from UCSC", what is the exact right file to download?
    1.1 is it fine for me to use bowtie-inspect to "output a FASTA file containing the sequences of the original references (with all non-A/C/G/T characters converted to Ns)"( http://bowtie-bio.sourceforge.net/ma...ndex-inspector)
    from the pre-built index @ ftp://ftp.cbcb.umd.edu/pub/data/bowt.../hg19.ebwt.zip under http://bowtie-bio.sourceforge.net/index.shtml, and then take the result as "the chromosomes from UCSC
    " to be input to getsplicefa.py??

    1.2 http://woldlab.caltech.edu/erange/RN...lysisSteps.txt has an sample command line: "python $ERANGEPATH/getsplicefa.py hsapiens /my/path/to/human/knownGene.txt hg18splice32.fa 28"

    However, I tried it but failed: "
    xu@linux18> python getsplicefa.py Human.txt knownGene.txt expandedgenome_spacer2maxBorder27.fa 27
    psyco not running
    getsplicefa.py: version 3.5
    72402
    655896
    10002 genes
    20003 genes
    30004 genes
    40005 genes
    50006 genes
    60007 genes
    70008 genes
    3624 splices too short to be seen
    555 splices will be under-reported"

    it seems Human.txt (the result of reverse engineering on ftp://ftp.cbcb.umd.edu/pub/data/bowt.../hg19.ebwt.zip mentioned above ) could not be identified by getsplicefa.py at all.

    1.3 And I am totally lost the way on setting H_sapiens(with a set of chromo1.bin~chromo22.bin and chromoX/Y.bin file and hsapiens.genedb) to be aware by cistematic core and ERANGE's getsplicefa


    2. Where should the downloaded "chromosomes from UCSC" be located compared with the scripts under ../ERANGE3.2.1/commoncode?

    3. "http://woldlab.caltech.edu/erange/README.rna-seq" mentions "build expanded genomes with splices and spikes"
    there is a "mm9splices_spikes.tgz (the files for building the expanded genomes and remapping splices) ", then my question is
    .what is the relationship of mm9splices_spikes.tgz with knownGene.txt?
    . Is such a file different from knownGene.txt necessary to build the expanded genome of hg19? if so, how can it be available?

    4.for Cistematic 3.0, "You will need to download the following packages: * cistematic3.0.tgz * db2.0.tgz"
    However, where should the files in db2.0.tgz be put? .../cistematic3.0/db, the folder in cistematic3.0.tgz ?

    5. there is many puzzles mentioned @ http://seqanswers.com/ on how to set the CISTEMATIC_ROOT, ERANGEPATH, PYTHONPATH, CISTEMATIC_TEMP.
    Would you please issue a working solution by a big while detailed picture of your varibale/path setting and the organization of the files of ERANGE, CISTEMATIC, "chromosomes from UCSC" and knownGene.txt?

    6. My RNA-seq's reads' length is diverse(from 13nt to 31 nt), how should I set the spacer and maxBorder in the scenario of hg 19 and varying (13nt-31nt) reads length?

    7. Once the expanded genome is ready from getsplicefa.py, may I immediately use ./bowtie-build to generate the index and then map with ./Bowtie ?

    8. how should such a spliced mapping results be fed to a peak finder? I've no experience with the later workflow of spliced mapping before.

    Sorry about so many questions, but it seems they are common questions shared by many green hand of ERANGE, please issue me some guidance if feasible

    Best,
    jie

Latest Articles

Collapse

  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM
  • seqadmin
    Strategies for Sequencing Challenging Samples
    by seqadmin


    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
    03-22-2024, 06:39 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
27 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
30 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 09:21 AM
0 responses
26 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-04-2024, 09:00 AM
0 responses
52 views
0 likes
Last Post seqadmin  
Working...
X