SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
SeqMonk v0.10.0 released simonandrews Bioinformatics 15 03-02-2015 12:27 PM
how to import my own genome sequence into SeqMonk? slny Bioinformatics 19 11-19-2014 12:43 AM
Programs for GC content and CpG Islands HelenM Bioinformatics 8 09-21-2014 07:21 PM
New release of SeqMonk (v0.8) simonandrews Bioinformatics 0 01-22-2010 06:53 AM
SeqMonk hon Bioinformatics 2 11-02-2009 01:48 AM

Reply
 
Thread Tools
Old 03-27-2012, 10:32 AM   #1
jjw14
Member
 
Location: Missouri

Join Date: Apr 2010
Posts: 39
Question SeqMonk: Export features (e.g. CpG Islands) from Ensembl for import into SeqMonk?

Hello,

I am using SeqMonk to view reads mapped to a genome. I would like to import an extra annotation set (CpG Islands) to use in my analysis as is described in the SeqMonk help file under the heading "Importing Extra Annotation".

I would like to use the CpG Island annotation set that is displayed in the Ensembl Genome Browser, but I can't figure out how to download just the CpG Island data track as a GFF file from Ensembl.

Can anyone tell me how to download just the CpG Island data from the Ensembl Genome Browser (or Ensembl FTP) in GFF format so that I can use it in SeqMonk?

Thanks in advance,
jjw
jjw14 is offline   Reply With Quote
Old 03-28-2012, 12:48 AM   #2
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

I don't think there is an easy way to do this from within Ensembl. You can use their BioMart interface to bulk download gene related information, but this doesn't work for other feature types. Their recommendation is to use their Perl API to pull down this kind of data, but if that's not something you're comfortable with then I guess that's not much help.

You can actually get at this kind of data much more easily from UCSC. Their table browser system allows you to export any of the annotation tracks into a simple text format which should be easy to import into SeqMonk.

As an aside, which genome are you using? CpG islands should be a standard track in the latest releases of genomes which contain this track.
simonandrews is offline   Reply With Quote
Old 03-29-2012, 06:45 AM   #3
jjw14
Member
 
Location: Missouri

Join Date: Apr 2010
Posts: 39
Default

Simon,

Thank you for the information. Sorry for the delay in my response.

I agree with you that the UCSC Table Browser is a great resource, and I have used it before for exporting specific tracks, including CpG Islands.

The reason I wanted to get the CpG Island information from Ensembl was that I have imported a custom genome (pig; Sus scrofa 10.2) into SeqMonk by modifying the EMBL formatted files from as you had described in your help file "Creating a Custom Genome".

Currently, the UCSC table browser is supporting the Nov. 2009, SGSC Sscrofa 9.2/Sscrofa2), so I wasn't sure if the CpG Islands exported from USCS would be compatible with the current S. scrofa 10.2 genome.

I have to admit that the differences in nomenclature for genomes of the same species from NCBI, Ensembl, etc. are still confusing to me, even though I have tried on numerous occasions to determine compatibility. For this reason, I wanted to obtain all data that I am going to put into SeqMonk from the same place. Yes, it's ignorance on my part, but I don't want to risk generating erroneous results.

Thank you for your fast response and advice. If you have any more input, I would be glad to hear it.

jjw
jjw14 is offline   Reply With Quote
Old 03-29-2012, 06:58 AM   #4
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

For this you'd have to go to the Ensembl API, though as pre.ensembl isn't in a release yet I'm not actually sure how you'd connect to that database to be able to run queries.

Hopefully the pig assembly will make it into a full ensembl release soon, at which point we'll add it to our list of supported genomes and you'll have the CpG island tracks present.
simonandrews is offline   Reply With Quote
Old 03-29-2012, 07:42 AM   #5
jjw14
Member
 
Location: Missouri

Join Date: Apr 2010
Posts: 39
Default

Thanks, Simon. If I figure out how, I will post the method I used here in case some other want to obtain similar data.

jjw
jjw14 is offline   Reply With Quote
Old 05-31-2012, 08:11 AM   #6
acongras
Junior Member
 
Location: Toulouse, France

Join Date: May 2012
Posts: 2
Default

Hello jjw,

I've just read this :

Quote:
Originally Posted by jjw14 View Post

I have imported a custom genome (pig; Sus scrofa 10.2) into SeqMonk by modifying the EMBL formatted files from as you had described in your help file "Creating a Custom Genome".

jjw
I am currently trying to do the same thing and I would need some tips..
I have downloaded the EMBL files (from 0.dat to 7000.dat) into my Genome directory. It seems that Seqmonk can open them without big troubles even if there are several scaffolds and AC lines into each files.
For many scaffolds, SeqMonk can attribute them to their specific chromosomes so the genome is almost recreated. But some other scaffolds are not attributed to any chromosome and are considered by Seqmonk as very very small independent chromosomes.

My questions are : do you have the same result? If not, how did you modify the files to get a full assembled genome?

Thanks for your help.
acongras is offline   Reply With Quote
Old 05-31-2012, 08:32 AM   #7
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

Which genome are you trying to use? I've just seen that the pig 10.2 assembly is now released into the main Ensembl, so I've just just kicked off the processing scripts to add it to the supported genomes in SeqMonk. It should be there late tonight or early tomorrow.

In general you can use the EMBL files exported by ensembl, but you only want to use the contigs which form part of the main chromosomes. There are a number of short scaffolds which aren't included in the main assembly (normally with names ending in _random), and it is these which will mess up the genome building in SeqMonk because it will treat each of these as a separate chromosome. In the API you can pull down slices only of type 'chromosome', but from the exported EMBL files you'll need to look at the names of the chromsome and filter out those which aren't actually part of the main assembly.
simonandrews is offline   Reply With Quote
Old 06-01-2012, 12:40 AM   #8
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

The Sus scrofa 10.2 genome assembly should now be available as a supported genome.
simonandrews is offline   Reply With Quote
Old 06-08-2012, 12:35 AM   #9
acongras
Junior Member
 
Location: Toulouse, France

Join Date: May 2012
Posts: 2
Default

Thanks for adding this genome, and for your quick answers.
acongras is offline   Reply With Quote
Reply

Tags
annotation, ensembl, seqmonk

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:03 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO