Seqanswers Leaderboard Ad

**GenoMax** · 02-07-2013, 09:21 AM

If you have the identifiers for the genes then you can use Table Browser (UCSC), BioMart (Ensembl) or other programmatic means to retrieve the UTR sequences.

See this post for links: http://seqanswers.com/forums/showthr...hlight=biomart

**Richard Finney** · 02-07-2013, 09:59 AM

This is a one-off so PLEASE check your results by hand (i.e. blat them and see if they are right).

Download genome (hg19 or whatever) files, then concatenate them into one file called "all.fa".
Get/compile/install samtools.

Use samtools faidx to index "all.fa"
Then get this file form UCSC:
ftp://hgdownload.cse.ucsc.edu/golden...refFlat.txt.gz
(or whatever genome/build you are interested in).

Ungzip and rename "refFlat.txt.gz" to refFlat.hg19.feb.2013.txt

Make a script out of the code below. Edit the SAMT and GENOMEPLACE lines to your situation.
______begin code____

export SAMT=/h1/finneyr/samtools-0.1.18/samtools
export GENOMEPLACE=/TCGA/nextgensupport/hg19/all.fa

cat refFlat.hg19.feb.3.2013.txt | awk -v GENO=$GENOMEPLACE \
'{if ($4=="+") print "$SAMT faidx "GENO" "$3":"$7"-"$5" > "$1"."$2".5p \n$SAMT faidx "GENO" "$3":"$8"-"$6" > "$1"."$2".3p\n";
else print "$SAMT faidx "GENO" "$3":"$5"-"$7" > "$1"."$2".3p \n$SAMT faidx "GENO" "$3":"$6"-"$8" > "$1"."$2".5p\n"}'
#note: field numbers:txStart=$5 txEnd=$6 cdsStart=$7 cdsEnd=$8
# chrom=$3 strand=$4

______ end code _____

Example run ... I call the script "job22" ...
-bash-3.00$ cat job22
export SAMT=/h1/finneyr/samtools-0.1.18/samtools
export GENOMEPLACE=/TCGA/nextgensupport/hg19/all.fa

cat refFlat.hg19.feb.3.2013.txt | awk -v GENO=$GENOMEPLACE \
'{if ($4=="+") print "$SAMT faidx "GENO" "$3":"$7"-"$5" > "$1"."$2".5p \n$SAMT faidx "GENO" "$3":"$8"-"$6" > "$1"."$2".3p\n";
else print "$SAMT faidx "GENO" "$3":"$5"-"$7" > "$1"."$2".3p \n$SAMT faidx "GENO" "$3":"$6"-"$8" > "$1"."$2".5p\n"}'

-bash-3.00$ ./job22 | head -6
$SAMT faidx /TCGA/nextgensupport/hg19/all.fa chr15:62929370-62937380 > MGC15885.NR_026897.3p
$SAMT faidx /TCGA/nextgensupport/hg19/all.fa chr15:62937380-62937380 > MGC15885.NR_026897.5p

$SAMT faidx /TCGA/nextgensupport/hg19/all.fa chr19:76219-77690 > FAM138F.NR_026820.3p
$SAMT faidx /TCGA/nextgensupport/hg19/all.fa chr19:77690-77690 > FAM138F.NR_026820.5p

-bash-3.00$ ./job22 | head -6 | bash
-bash-3.00$ head MGC15885.NR_026897.3p
>chr15:62929370-62937380
GTTCACCTGGTCTTGACCTTCACTTTTATTTTTCTTCTATTTTTTTCTTGGAGCTGACCT
TTTACATTTCTATTGTATCCATTTTTGTAAACAATCTACTTTCAATCATTTGAATAAGTT
AATGTATAAAAGAATTCAAAGTCAGAGTTCAGTTTAGAGCCACCTTCTTTCTGAAGCTTG
TAACAAGAGGAGGAAAATAGCAGGACTGAAAGGTAGACTCCAAGAGGACTGAAATGTATG
GATGATTTATTCAGCTGTCTTGGCAACCACAGGGGAATAGTGAGATTGCTCGAGAGCTGA
CACAGCCTTCTTACGGTTCGACAAAAAACGACAGTATCTTCCACATACAGGCCAGGAATT
CATGTATCTTCCCAGAACCTCTGTTTTTATCTGTGGAAGGGGGGTGCCAAAAAATGCAAA
ATCCTTTTAGCTTTCCAGCCTATTGATCATATCCAGGGACAAGATATACATGGAAGCGCC
CTGGAGCACTTCATTGCTGAGTGGTCATCAGGTGATAGCATCTCCTGTTTGTTTCACTGG

get rid of the "head -6" clause to run the whole thing, you must run it through "bash" as "job22" just generates the script

Topics	Statistics	Last Post
The Role of Enhancers in Defining Cell Fate by seqadmin Started by seqadmin, Yesterday, 10:49 AM	0 responses 17 views 0 likes	Last Post by seqadmin Yesterday, 10:49 AM
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 24 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 20 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM

Seqanswers Leaderboard Ad

Announcement

retreave UTRs for each gene by rna-seq

Comment

Comment

Latest Articles

ad_right_rmr

News