Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • map locations in GrCh38 to GrCh37/hg19 (yes, in reverse)

    I have downloaded several restriction enzyme cut sites (single bp location) that have been mapped on the newest human genome build, GrCh38. However, ALL of my previous analysis is on GrCh37/hg19.

    There are web pages that map FORWARD to GrCh38. I need to go BACKWARD. My google-fu has failed to find this.

    Any help is appreciated.

  • #2
    You will find the liftover file you need here: http://hgdownload.soe.ucsc.edu/golde...hg38/liftOver/

    Comment


    • #3
      Wonderful! Thank you.

      Comment


      • #4
        Ugh. It's not working.

        I downloaded the file hg38ToHg19.over.chain.gz and downloaded the latest build of the "liftOver" program.

        I ran it (more or less) as:

        liftOver grch38_location.bed hg38ToHg19.over.chain.gz hg19_location.bed err.log

        But the log file (which holds unmapped entries) says EVERY entry has been removed. Specifically, each reading has the message:

        #Deleted in new

        The only oddity about my input files is that every entry is one base pair. So, an example BED entry would be:

        chr1 12411 12411

        with tabs as delimiters

        Do I need to alter these files to add one to each second position? Any other ideas?

        Thanks for any help.

        ~EDIT~
        I saw a post about converting SNPs and this person's BED file had the equivalent of:
        chr1 12411 12412
        I'm writing a script to alter these files. So, if you know this is the answer, no need to reply. I will update with my results.
        Last edited by SrCardgage; 10-01-2014, 07:39 AM. Reason: moar infos

        Comment


        • #5
          see: https://groups.google.com/a/soe.ucsc...me/X1AhPz8Ozkc That probably applies to the command line liftover as well.

          chr1 12411 14412
          Last edited by GenoMax; 10-01-2014, 08:44 AM.

          Comment


          • #6
            Success! I did indeed need a "width" of one, not zero, for the input BED files.

            Results:
            a) 2,317,719 total entries, only 28,414 (1.2%) did not map
            b) 7,199,381 total entries, only 118,690 (1.6%) did not map

            I consider this to be very acceptable.

            Thank you, GenoMax for your quick responses!

            Comment


            • #7
              I have found duplicate entries in the file that "liftOver" produces. I'm guessing some cut sites were close together and were therefore "joined".

              I think the best way to remove these is:
              sort -u liftOver_output.txt > unique_output.txt

              Don't use "uniq" as it appears to not deal with large files very well. Learned a new unix lesson, today!

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              27 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              31 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              27 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              52 views
              0 likes
              Last Post seqadmin  
              Working...
              X