SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Why there is a difference in size between R1 and R2 fastq files from BAM? genomics Bioinformatics 1 05-04-2015 02:51 PM
coordinates to hg19 refseq gene name cmccabe Bioinformatics 3 05-28-2014 10:10 PM
RNA-seq library size difference foxytomato Sample Prep / Library Generation 1 09-19-2013 07:07 AM
Difference between assembly, gene annotation, and reference genome? prs321 Bioinformatics 4 08-29-2013 11:24 AM
the hg19 chromosome size in ncbi is different with UCSC? tintin306 Bioinformatics 1 04-10-2012 12:18 AM

Reply
 
Thread Tools
Old 03-13-2017, 07:27 AM   #1
reza.mozafari
Junior Member
 
Location: Italy, Milan

Join Date: Mar 2017
Posts: 5
Post How I can explain the difference in gene size in hg19 and NG genome coordinates?

I was looking at the gene coordinates of LDLR for the hg19 assenbly and NG RefSeq in order to convert them to each other, while I do understand the length of the gene may be different in the two assemblies, I fail to understand why should there be a difference in the length of the same gene in the two assemblies.
I do appreciate if someone can clarify it to me
reza.mozafari is offline   Reply With Quote
Old 03-13-2017, 07:36 AM   #2
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,470
Default

It's likely that you're looking at the length of the gene in one case and the length of the spliced transcript in the other.
dpryan is offline   Reply With Quote
Old 03-13-2017, 08:21 AM   #3
reza.mozafari
Junior Member
 
Location: Italy, Milan

Join Date: Mar 2017
Posts: 5
Default

Quote:
Originally Posted by dpryan View Post
It's likely that you're looking at the length of the gene in one case and the length of the spliced transcript in the other.
Thanks. Actually I am going to convert coordinates from RefSeqGene NG to Hg19 genome assembly and vice versa through developing an App, but, there is a problem with variable gene lengths in hg19 and NG. Do you know any tool able to handle it?!
reza.mozafari is offline   Reply With Quote
Old 03-13-2017, 08:27 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,411
Default

Can you post an example of such a difference?
GenoMax is offline   Reply With Quote
Old 03-13-2017, 02:32 PM   #5
reza.mozafari
Junior Member
 
Location: Italy, Milan

Join Date: Mar 2017
Posts: 5
Default

Quote:
Originally Posted by GenoMax View Post
Can you post an example of such a difference?
For instance, gene KRAS

NG_007524.1 ng_start: 4990 ng_end: 51132 gene_length: 46142
hg_ hg_start: 25358180 hg_end: 25403870 gene_length 45690

As you can see, the length of the gene based of NG RefSeq is bigger than the hg genome coordinate that I think it cannot be matched with the definition that have raised.
reza.mozafari is offline   Reply With Quote
Old 03-14-2017, 12:33 AM   #6
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,470
Default

Compare that to the (generally superior) Ensembl annotation. KRAS in refgene is 46143, in Ensembl (even back in hg19) it's 46148. The just differ by some (likely intronic) indels.

Anyway, genes in refseq don't have to come from the same people that were used to make the reference genome, so you'll find plenty of differences.
dpryan is offline   Reply With Quote
Old 03-14-2017, 02:37 AM   #7
reza.mozafari
Junior Member
 
Location: Italy, Milan

Join Date: Mar 2017
Posts: 5
Default

Quote:
Originally Posted by dpryan View Post
Compare that to the (generally superior) Ensembl annotation. KRAS in refgene is 46143, in Ensembl (even back in hg19) it's 46148. The just differ by some (likely intronic) indels.

Anyway, genes in refseq don't have to come from the same people that were used to make the reference genome, so you'll find plenty of differences.
Is it true that the NG RefSeq category, means that sequence information about the regions are still being updated in order to fill in blanks (the Ns) and do some corrections, or the difference is more related to some variations such as Indel?

And, is there any tool able to convert Refseq to assembly?
reza.mozafari is offline   Reply With Quote
Old 03-14-2017, 03:45 AM   #8
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,411
Default

RefSeqGene project has a defined aim (see "about" section): https://www.ncbi.nlm.nih.gov/refseq/rsg/about/ If your intended application does not match the aim of that project you should probably use standard RefSeq (or Ensembl annotation as @Devon suggests).
GenoMax is offline   Reply With Quote
Old 03-14-2017, 05:08 AM   #9
reza.mozafari
Junior Member
 
Location: Italy, Milan

Join Date: Mar 2017
Posts: 5
Default

Thanks for all above. I have a Perl script that could convert the RefSeq (NG_) of the gene to hg19 assembly and vice versa. But, the problem would arise when the length of the gene based on these coordinates are different, like the example that I took. Now I am wondering that this variety could be logical anyway. Thus, the code should modify in a way be able to consider a ratio for difference between source length and target length, similar to NCBI Remapping Service.
Apart from that, it is still not clear for me that what would be the exact application of such converting process or when we need to use it?! Where is the intersection of these two coordinates in data analysis (NG and hg)?!
reza.mozafari is offline   Reply With Quote
Reply

Tags
assembly annotation, ngs, refseq id conversion

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:45 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO