SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
BAM + CHAIN = BAM? (liftover woes) Ajb123 Bioinformatics 2 11-01-2012 05:57 AM
liftOver (convert BED files to another assembly) rebrendi Bioinformatics 2 08-07-2012 04:41 AM
Issue with LiftOver ashkot Bioinformatics 1 05-17-2012 10:25 AM
liftover -errorHelp chenjy Bioinformatics 0 05-24-2011 02:47 AM
cuffcompare issue with many input gtf files choy Bioinformatics 1 01-08-2010 09:49 AM

Reply
 
Thread Tools
Old 11-06-2013, 12:57 AM   #1
Slacanch
Member
 
Location: paris

Join Date: Jan 2013
Posts: 10
Question SGD liftOver chain files issue

SGD (Saccharomyces Genome Database), contains a repository of liftOver chain files that allow to convert form any version of the reference genome to the last one.

after trying, and failing multiple times, to convert between two assemblies, i decided to take a look at the chain files themselves and i compared the chain files provided from sgd with the same file provided by the ucsc genome browser (R61 to R64 from sgd, sacCer2 to sacCer3 from ucsc).

the result is rather odd; this is the beginning of the file in the two instances:

UCSC:

Code:
chain 21724089 chrI 230208 + 0 230208 chrI 230218 + 0 230218 16
3834	1	0
2091	0	1
527	1	0
10002	1	1
10	1	0
29	1	0
SGD:

Code:
chain 21724089 chr01_2008_03_05 230208 + 0 230208 chr01_2011_02_03 230218 + 0 230218 1
3834	1	0
2091	0	1
527	1	0
10002	1	1
10	1	0
29	1	0

most of the file is actually the same, but the definition of the chromosomes is really strange, it has dates attached! which render the file practically useless (i've managed to successfully liftover using the ucsc chain file, while i could not using th SGD one)

Code:
chrI 230208   vs     chr01_2008_03_05 230208
the strange pattern repeats for every chromosome in the sgd file.



is there something i missed? am i the only one having this issue? what is the meaning of this thing? i am confused
Slacanch is offline   Reply With Quote
Old 02-28-2018, 01:33 PM   #2
raim
Junior Member
 
Location: europe

Join Date: Jun 2016
Posts: 2
Default sgd liftover problem solved?

Hi Slacanch,
Have you ever managed to solve this problem?
best,
Rainer
raim is offline   Reply With Quote
Old 02-28-2018, 10:39 PM   #3
raim
Junior Member
 
Location: europe

Join Date: Jun 2016
Posts: 2
Default

Hi,

The SGD helpdesk said the date appendix in chromosome names can just be deleted.
Also, at least the liftOver reimplementation in R/bioconductor's rtracklayer can't handle the comment lines starting with ##.

So the following command-line processing seems to do the trick, although I have not tested the actual mapping result yet:

grep -v "##" V43_2004_07_26_V64_2011_02_03.over.chain | sed 's/\(chr..\)_[^ ]*/\1/g' > V43_2004_07_26_V64_2011_02_03_fixed.over.chain

Note that in your case you would still need to convert the roman chromosome numbers
to two digit arabic, e.g. chrI should be chr01. And watch mitochondrial chromosomes, chrM is not chr1000

Rainer

Last edited by raim; 02-28-2018 at 10:50 PM.
raim is offline   Reply With Quote
Reply

Tags
chain files, liftover, sgd

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:37 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO