Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Translate coordinates between 2 references

    Hi,

    Are there any tools to translate the coordinates between 2 reference fasta files, such as HG18 and HG19. I need a tool which if I give 2 references and a indel file, and a list a of locations in 1 reference, then return the according locations in the other reference.

    I would hate to have to write that myself.

    Thanks

  • #2
    ucsc liftOver

    Comment


    • #3
      As long as there are liftOver chain (i.e. the "dictionary"), you can use liftOver, either online or by downloading the binary and the "dictionary". It works with intervals (BED files), so if you have to translate a wiggle file, convert it into bedgraph first.

      d

      Comment


      • #4
        To elaborate. In order to use 'liftOver' you need to download the executable tool and the right dictionary file (i.e. the one that corresponds to your current and target genome versions). Links:

        Liftover executable & Liftover files.

        Find your genome of interest, then follow the appropriate 'LiftOver Files' link, then find the file that corresponds to the two genome builds of interest (e.g. hg18ToHg19.over.chain.gz)

        Comment


        • #5
          I have a similar question. I ran some sequences using tophat/bowtie based on NCBI ref v37 instead of UCSChg19. So the files (sam, wig etc) have Ids like NC_0000001 instead of chr1. Unfortunately then I realized that these Ids doesn't work with IGV.
          I heard that NCBI v37.3 and UCSC 19 are identical, so can I just use perl to run a replace on these text files? (I guess it is similar to liftover, but I did not see a dictionary for ncbi->hg)

          Thanks
          Heng

          Comment


          • #6
            Liftover is primarily concerned with converting the coordinates of features on one genome build to the corresponding coordinates on a different genome build of the same species (or orthologous position on a different species build). The difference in naming of the chromosomes themselves is due to different conventions used by UCSC versus NCBI. You should be able to remap the names (but check to see if they have a one-to-one relationship). You can confirm that your NCBI build corresponds to a particular UCSC build here:
            UCSC Releases

            When dealing with genome builds from different sources, particularly human, it is important to think about how the source (NCBI, UCSC, Ensembl) deals with the haplotype chromosomes and unassembled contigs (those pieces of the genome that still have not been assigned to a chromosome). For these sequences, figuring out the mapping of names is not always obvious and worse, there isn't necessarily a one-to-one relationship. For example, NCBI may keep unassembled contigs from chr1 as separate entries whereas UCSC may place them in a 'chrUn' entry. Thankfully, the bulk of the human build (corresponding to chromosome 1-22, x, y, and the mitochondrial genome) should be consistent and one-to-one. UCSC provides detailed descriptions of the idiosyncrasies of each build on their website under 'assembly details' for each assembly.

            Comment


            • #7
              malachig,

              I just compared h_sapiens_37_asm.fa and hg19.fa, they are indeed the same except names. Each with 25 fasta sequences so they are one-to-one. So I guess the unmapped ones are at least consistent for this version of human reference genome.

              Thanks for the update. I guess I will just ran a replace for my text output now since it took me several days to ran my program. (I only have very limited nodes)

              Comment


              • #8
                I download the liftover and chain file, the instruction says

                liftOver oldFile map.chain newFile unMapped

                If I want to translate 1 position in bed format chr1:344-344, how do I do this, what is oldFile and unMapped?

                Comment


                • #9
                  Originally posted by foxyg View Post
                  I download the liftover and chain file, the instruction says

                  liftOver oldFile map.chain newFile unMapped

                  If I want to translate 1 position in bed format chr1:344-344, how do I do this, what is oldFile and unMapped?
                  oldFile is the file of coordinates you want to convert (typically BED)
                  unMapped is a file that is created when you run liftOver: it contains those features in oldFile that did not lift over to the new coordinates, and gives the reason why (e.g. partially deleted)

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Advancing Precision Medicine for Rare Diseases in Children
                    by seqadmin




                    Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                    12-16-2024, 07:57 AM
                  • seqadmin
                    Recent Advances in Sequencing Technologies
                    by seqadmin



                    Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                    Long-Read Sequencing
                    Long-read sequencing has seen remarkable advancements,...
                    12-02-2024, 01:49 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 12-17-2024, 10:28 AM
                  0 responses
                  33 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 12-13-2024, 08:24 AM
                  0 responses
                  49 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 12-12-2024, 07:41 AM
                  0 responses
                  34 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 12-11-2024, 07:45 AM
                  0 responses
                  46 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X