Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SGD liftOver chain files issue

    SGD (Saccharomyces Genome Database), contains a repository of liftOver chain files that allow to convert form any version of the reference genome to the last one.

    after trying, and failing multiple times, to convert between two assemblies, i decided to take a look at the chain files themselves and i compared the chain files provided from sgd with the same file provided by the ucsc genome browser (R61 to R64 from sgd, sacCer2 to sacCer3 from ucsc).

    the result is rather odd; this is the beginning of the file in the two instances:

    UCSC:

    Code:
    chain 21724089 chrI 230208 + 0 230208 chrI 230218 + 0 230218 16
    3834	1	0
    2091	0	1
    527	1	0
    10002	1	1
    10	1	0
    29	1	0
    SGD:

    Code:
    chain 21724089 chr01_2008_03_05 230208 + 0 230208 chr01_2011_02_03 230218 + 0 230218 1
    3834	1	0
    2091	0	1
    527	1	0
    10002	1	1
    10	1	0
    29	1	0

    most of the file is actually the same, but the definition of the chromosomes is really strange, it has dates attached! which render the file practically useless (i've managed to successfully liftover using the ucsc chain file, while i could not using th SGD one)

    Code:
    chrI 230208   vs     chr01_2008_03_05 230208
    the strange pattern repeats for every chromosome in the sgd file.



    is there something i missed? am i the only one having this issue? what is the meaning of this thing? i am confused

  • #2
    sgd liftover problem solved?

    Hi Slacanch,
    Have you ever managed to solve this problem?
    best,
    Rainer

    Comment


    • #3
      Hi,

      The SGD helpdesk said the date appendix in chromosome names can just be deleted.
      Also, at least the liftOver reimplementation in R/bioconductor's rtracklayer can't handle the comment lines starting with ##.

      So the following command-line processing seems to do the trick, although I have not tested the actual mapping result yet:

      grep -v "##" V43_2004_07_26_V64_2011_02_03.over.chain | sed 's/\(chr..\)_[^ ]*/\1/g' > V43_2004_07_26_V64_2011_02_03_fixed.over.chain

      Note that in your case you would still need to convert the roman chromosome numbers
      to two digit arabic, e.g. chrI should be chr01. And watch mitochondrial chromosomes, chrM is not chr1000

      Rainer
      Last edited by raim; 02-28-2018, 11:50 PM.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM
      • seqadmin
        Techniques and Challenges in Conservation Genomics
        by seqadmin



        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

        Avian Conservation
        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
        03-08-2024, 10:41 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 06:37 PM
      0 responses
      11 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, Yesterday, 06:07 PM
      0 responses
      10 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-22-2024, 10:03 AM
      0 responses
      51 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-21-2024, 07:32 AM
      0 responses
      68 views
      0 likes
      Last Post seqadmin  
      Working...
      X