Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
UCSC refSeq Gene and hg19 coordinate thedamian Bioinformatics 6 02-07-2013 01:18 AM
UCSC refseq to gff3 Wallysb01 Bioinformatics 2 03-29-2012 10:06 AM
Non-species refseq genes in UCSC Genome Browser stephenhart Bioinformatics 0 02-14-2012 11:03 PM
UCSC refSeq to rod? Kath Bioinformatics 1 01-14-2011 05:58 PM
Exons from UCSC (Refseq) khb Bioinformatics 0 12-21-2010 10:47 PM

Thread Tools
Old 01-31-2014, 05:39 AM   #1
Location: All over the world

Join Date: May 2013
Posts: 67
Default Mapping RefSeq transcripts to the genome using UCSC - See more at: http://blog.avadis

Mapping RefSeq transcripts to the genome using UCSC - See more at:

Transcript annotations are extensively used in NGS data analysis. In RNA-Seq, they are used at every step of the pipeline – to map spliced reads against the genome, perform quantification, detect novel exons etc. In DNA-Seq, they are used to predict the effect of variants detected in the sample. Clearly accurate transcript annotations are vital for NGS work.
Many researchers prefer to work with RefSeq transcripts because they are manually curated. But there is a problem. The RefSeq transcript project provides the transcript sequence and the location of exons on the transcript sequence but does not provide the genomic coordinates for the exons. So one common strategy is to obtain the genomic coordinates from UCSC. The folks at UCSC routinely align the RefSeq transcript sequences against the genome using BLAT and make the results available as a “refFlat” files in their download site.
Unfortunately, these BLAT alignment are sometimes wrong.
Shown below is the transcript track for TNNI3 which is a gene on the negative strand of chromosome 19. Note that the coding region of the first exon in the “RefSeq genes” picture occupies 22bp while the USCC track at the top shows only 11bp.
Exon 1 of TNNI3 in UCSC

The RefSeq transcript that was used by UCSC for alignment can be obtained by clicking on the TNNI3 word in the RefSeq gene track and it is NM_000363.4. A portion of the transcript entry is shown below.
TNNI3 RefSeq transcript details

The RefSeq entry clearly indicates that only 11 bases (144-154) at the end of the first exon represent coding bases. Moreover, the transcript has a CCDS entry indicating that there is a genomic alignment which translates to the protein sequence shown.
To get a better understanding of the problem, we looked at the UCSC and the RefSeq transcripts in more detail in the Elastic Genome Browser. The introns have been compressed so that exonic and essential splice site sequences can be seen in more detail.

Some of the observations from the above picture are:
the alignment for the RefSeq transcript leads to a premature stop-codon very early on,
the essential splice site signals are correct in the UCSC transcript but wrong in the RefSeq transcript alignment
These are sanity checks that any researcher using the UCSC alignments of RefSeq transcripts should incorporate before carrying out analysis.
And, finally, the picture also suggests why this error happened. The incorrect extension to exon 1 in the RefSeq transcript alignment (GCATCACTCAC) is very similar to the sequence of the small exon 2 present in the UCSC transcript (GCATCGCTGCTC). It is possible that the BLAT alignment is not well suited for detecting small intermediate exons especially if there is an alternate alignment which is very similar.
- See more at:
Strandlife is offline   Reply With Quote

mapping refseq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 06:19 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO