Mapping RefSeq transcripts to the genome using UCSC - See more at: http://blog.avadis

Strandlife

strandlife

Join Date: May 2013

Posts: 67
- Share
- Tweet
#1

Mapping RefSeq transcripts to the genome using UCSC - See more at: http://blog.avadis

01-31-2014, 05:39 AM

Mapping RefSeq transcripts to the genome using UCSC - See more at: http://blog.avadis-ngs.com/#sthash.9KimlOwK.dpuf

Transcript annotations are extensively used in NGS data analysis. In RNA-Seq, they are used at every step of the pipeline – to map spliced reads against the genome, perform quantification, detect novel exons etc. In DNA-Seq, they are used to predict the effect of variants detected in the sample. Clearly accurate transcript annotations are vital for NGS work.
Many researchers prefer to work with RefSeq transcripts because they are manually curated. But there is a problem. The RefSeq transcript project provides the transcript sequence and the location of exons on the transcript sequence but does not provide the genomic coordinates for the exons. So one common strategy is to obtain the genomic coordinates from UCSC. The folks at UCSC routinely align the RefSeq transcript sequences against the genome using BLAT and make the results available as a “refFlat” files in their download site.
Unfortunately, these BLAT alignment are sometimes wrong.
Shown below is the transcript track for TNNI3 which is a gene on the negative strand of chromosome 19. Note that the coding region of the first exon in the “RefSeq genes” picture occupies 22bp while the USCC track at the top shows only 11bp.
Exon 1 of TNNI3 in UCSC

The RefSeq transcript that was used by UCSC for alignment can be obtained by clicking on the TNNI3 word in the RefSeq gene track and it is NM_000363.4. A portion of the transcript entry is shown below.
TNNI3 RefSeq transcript details

The RefSeq entry clearly indicates that only 11 bases (144-154) at the end of the first exon represent coding bases. Moreover, the transcript has a CCDS entry indicating that there is a genomic alignment which translates to the protein sequence shown.
To get a better understanding of the problem, we looked at the UCSC and the RefSeq transcripts in more detail in the Elastic Genome Browser. The introns have been compressed so that exonic and essential splice site sequences can be seen in more detail.
TNNI3 in EGB

Some of the observations from the above picture are:
the alignment for the RefSeq transcript leads to a premature stop-codon very early on,
the essential splice site signals are correct in the UCSC transcript but wrong in the RefSeq transcript alignment
These are sanity checks that any researcher using the UCSC alignments of RefSeq transcripts should incorporate before carrying out analysis.
And, finally, the picture also suggests why this error happened. The incorrect extension to exon 1 in the RefSeq transcript alignment (GCATCACTCAC) is very similar to the sequence of the small exon 2 present in the UCSC transcript (GCATCGCTGCTC). It is possible that the BLAT alignment is not well suited for detecting small intermediate exons especially if there is an alternate alignment which is very similar.
- See more at: http://blog.avadis-ngs.com/#sthash.9KimlOwK.dpuf
Tags: mapping refseq

Previous template Next

Advancing Precision Medicine for Rare Diseases in Children

by seqadmin

Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
- Channel: Articles
12-16-2024, 07:57 AM
Recent Advances in Sequencing Technologies

by seqadmin

Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

Long-Read Sequencing
Long-read sequencing has seen remarkable advancements,...
- Channel: Articles
12-02-2024, 01:49 PM

Topics	Statistics	Last Post
Evaluating Genome Sequencing for ECMO Patients in the NICU by seqadmin Started by seqadmin, 12-17-2024, 10:28 AM	0 responses 33 views 0 likes	Last Post by seqadmin 12-17-2024, 10:28 AM
New Genetic Toolkit Refines Studies on Gene Function and Disease by seqadmin Started by seqadmin, 12-13-2024, 08:24 AM	0 responses 49 views 0 likes	Last Post by seqadmin 12-13-2024, 08:24 AM
Study Links Brain Mechanism to Emotional Responses in Animals and Humans by seqadmin Started by seqadmin, 12-12-2024, 07:41 AM	0 responses 34 views 0 likes	Last Post by seqadmin 12-12-2024, 07:41 AM
Study Identifies Ribosomal RNA Fingerprints as Early Cancer Biomarkers by seqadmin Started by seqadmin, 12-11-2024, 07:45 AM	0 responses 46 views 0 likes	Last Post by seqadmin 12-11-2024, 07:45 AM

Seqanswers Leaderboard Ad

Announcement

Mapping RefSeq transcripts to the genome using UCSC - See more at: http://blog.avadis

Latest Articles

ad_right_rmr

News