Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How I can explain the difference in gene size in hg19 and NG genome coordinates?

    I was looking at the gene coordinates of LDLR for the hg19 assenbly and NG RefSeq in order to convert them to each other, while I do understand the length of the gene may be different in the two assemblies, I fail to understand why should there be a difference in the length of the same gene in the two assemblies.
    I do appreciate if someone can clarify it to me

  • #2
    It's likely that you're looking at the length of the gene in one case and the length of the spliced transcript in the other.

    Comment


    • #3
      Originally posted by dpryan View Post
      It's likely that you're looking at the length of the gene in one case and the length of the spliced transcript in the other.
      Thanks. Actually I am going to convert coordinates from RefSeqGene NG to Hg19 genome assembly and vice versa through developing an App, but, there is a problem with variable gene lengths in hg19 and NG. Do you know any tool able to handle it?!

      Comment


      • #4
        Can you post an example of such a difference?

        Comment


        • #5
          Originally posted by GenoMax View Post
          Can you post an example of such a difference?
          For instance, gene KRAS

          NG_007524.1 ng_start: 4990 ng_end: 51132 gene_length: 46142
          hg_ hg_start: 25358180 hg_end: 25403870 gene_length 45690

          As you can see, the length of the gene based of NG RefSeq is bigger than the hg genome coordinate that I think it cannot be matched with the definition that have raised.

          Comment


          • #6
            Compare that to the (generally superior) Ensembl annotation. KRAS in refgene is 46143, in Ensembl (even back in hg19) it's 46148. The just differ by some (likely intronic) indels.

            Anyway, genes in refseq don't have to come from the same people that were used to make the reference genome, so you'll find plenty of differences.

            Comment


            • #7
              Originally posted by dpryan View Post
              Compare that to the (generally superior) Ensembl annotation. KRAS in refgene is 46143, in Ensembl (even back in hg19) it's 46148. The just differ by some (likely intronic) indels.

              Anyway, genes in refseq don't have to come from the same people that were used to make the reference genome, so you'll find plenty of differences.
              Is it true that the NG RefSeq category, means that sequence information about the regions are still being updated in order to fill in blanks (the Ns) and do some corrections, or the difference is more related to some variations such as Indel?

              And, is there any tool able to convert Refseq to assembly?

              Comment


              • #8
                RefSeqGene project has a defined aim (see "about" section): https://www.ncbi.nlm.nih.gov/refseq/rsg/about/ If your intended application does not match the aim of that project you should probably use standard RefSeq (or Ensembl annotation as @Devon suggests).

                Comment


                • #9
                  Thanks for all above. I have a Perl script that could convert the RefSeq (NG_) of the gene to hg19 assembly and vice versa. But, the problem would arise when the length of the gene based on these coordinates are different, like the example that I took. Now I am wondering that this variety could be logical anyway. Thus, the code should modify in a way be able to consider a ratio for difference between source length and target length, similar to NCBI Remapping Service.
                  Apart from that, it is still not clear for me that what would be the exact application of such converting process or when we need to use it?! Where is the intersection of these two coordinates in data analysis (NG and hg)?!

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM
                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  18 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  22 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  17 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  49 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X