SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
ChIP-Seq: Enabling Data Analysis on High-Throughput Data in Large Data Depository Usi Newsbot! Literature Watch 1 04-18-2018 10:50 PM
Cufflinks - Nature Biotech data sets adrian Bioinformatics 1 04-16-2011 05:40 PM
public data sets muchomaas Bioinformatics 2 06-08-2010 02:48 AM
sff_extract: combining data from 454 Flx and Titanium data sets agroster Bioinformatics 7 01-14-2010 11:19 AM
SeqMonk - Flexible analysis of mapped reads simonandrews Bioinformatics 7 07-24-2009 05:12 AM

Reply
 
Thread Tools
Old 11-30-2010, 01:42 AM   #1
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 870
Post SeqMonk: Visualisation and analysis for large mapped data sets

[Sorry to open a new thread, but as there were a few previous SeqMonk threads I thought I'd open a new one and put everything in here from now on]

SeqMonk is a desktop application which is able to view and analyse large mapped data sets. It is a cross platform program which runs on normal PC hardware, and is designed with the needs of the bench scientist in mind.

It requires no computing infrastructure (back end databases etc), and virtually no configuration before you can start analysing your data.

I've just put out the latest release of the program (v0.13.0), which includes some significant additions to the program:
  • You can now import data from BAM files, and SAM import is much quicker than before
  • You can export quantitated data as BEDGraph files
  • A line graph tool to visualise quantitation changes between samples
  • A set of clustering tools to automatically or manually find sets of probes with correlated quantitation profiles
  • A new per-probe normalisation method

I've also created a series of tutorial videos on our youtube channel which show how to get started with SeqMonk, and also go through some example analyses of ChIP-Seq or RNA-Seq data. More of these will be coming in the next few weeks.

You can get more information on SeqMonk from:

http://www.bioinformatics.bbsrc.ac.uk/projects/seqmonk/

I'll post future updates in this thread, but you can also be notified about any of our software packages by following our twitter stream.
simonandrews is offline   Reply With Quote
Old 12-02-2010, 06:33 AM   #2
pbseq
Member
 
Location: italy

Join Date: Feb 2010
Posts: 16
Default

Hi Simon
I'd be pleased to test seqmonk but I didn't find among downloadable genomes rice (Oryza sativa) on which I'm working right now.
any option to see rice and other plant genomes (apart from arabidospis) downloadable ? or is there any way to provide seqmonk with more genomes than currently available/downloadable ?
pbseq is offline   Reply With Quote
Old 12-02-2010, 07:30 AM   #3
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 870
Default

Quote:
Originally Posted by pbseq View Post
Hi Simon
I'd be pleased to test seqmonk but I didn't find among downloadable genomes rice (Oryza sativa) on which I'm working right now.
any option to see rice and other plant genomes (apart from arabidospis) downloadable ? or is there any way to provide seqmonk with more genomes than currently available/downloadable ?
Our genome data is derived from Ensembl so we only easily support genomes which they have on their system. We recently expanded our coverage to the EnsemblGenomes site from the main site (to add in bacteria and plants). I'm just in the process of including TAIR10 for Arabidopsis, but will put Rice as the next genome in the queue after that. It should be on the servers early next week.

You can also make up your own custom genomes to use with the program. The annotations are read from EMBL format feature files with some minor modifications. Details of how to do this are in:

http://www.bioinformatics.bbsrc.ac.u...OM_GENOMES.txt
simonandrews is offline   Reply With Quote
Old 12-03-2010, 04:03 AM   #4
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 870
Default

Quote:
Originally Posted by pbseq View Post
I'd be pleased to test seqmonk but I didn't find among downloadable genomes rice (Oryza sativa) on which I'm working right now.
I've just added the rice genome to our servers so you should be able to import it now.

If it doesn't show up in the genome list when you go to import it then use a browser to go to:

http://www.bioinformatics.bbsrc.ac.u...nome_index.txt

..and press shift+refresh in your browser to clear you cache, and then try again.
simonandrews is offline   Reply With Quote
Old 12-03-2010, 04:28 AM   #5
pbseq
Member
 
Location: italy

Join Date: Feb 2010
Posts: 16
Default

Simon,
many thanks, I'll give a try soon
pbseq is offline   Reply With Quote
Old 12-04-2010, 08:24 PM   #6
seqcode
Junior Member
 
Location: USA

Join Date: Dec 2010
Posts: 4
Default

Hi, Simon:

I just downloaded seqMonk and tried to run it on my PC but got such a error message:


C:\Documents and Settings>java -Xms128m -Xmx1500m -Dsun.java2d.opengl=fals
e -classpath .;./sam-1.32.jar uk.ac.bbsrc.babraham.SeqMonk.SeqMonkApplication
Error occurred during initialization of VM
Could not reserve enough space for object heap
Could not create the Java virtual machine.

C:\Documents and Settings


Is there anything I should set up before running seqMonk?

Thanks in advance!
seqcode is offline   Reply With Quote
Old 12-05-2010, 03:02 AM   #7
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 870
Default

Quote:
Originally Posted by seqcode View Post
Hi, Simon:

I just downloaded seqMonk and tried to run it on my PC but got such a error message:

Could not reserve enough space for object heap
The default configuration for 32 bit seqmonk assumes that you have 2GB of RAM available. If you have less than this (which it appears you do), then you'll need to lower the amount of RAM SeqMonk can access. This will limit the amount of data you can analyse, but may be OK depending on the size of dataset you're working with.

Details of how to change the memory setup from the default are in the README.txt file.
simonandrews is offline   Reply With Quote
Old 12-05-2010, 08:14 AM   #8
seqcode
Junior Member
 
Location: USA

Join Date: Dec 2010
Posts: 4
Default

Thanks a lot, Simon. It worked after I changed the allowed memory to 1400m. I only have 3 GB memory on my computer, I guess the memory available to this program (with several other applications running too) is just under the default requirement.

I am quite new to this field and am just familiaring myself with seqMonk. I am planning to run some experiment to look at global DNA methylation profile in certain animal disease model with a meDIP-seq approach. From what I understand so far, seqMonk should be suited to analyze this data, am I right?

Thanks again.
seqcode is offline   Reply With Quote
Old 12-05-2010, 12:00 PM   #9
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 870
Default

Quote:
Originally Posted by seqcode View Post
Thanks a lot, Simon. It worked after I changed the allowed memory to 1400m. I only have 3 GB memory on my computer, I guess the memory available to this program (with several other applications running too) is just under the default requirement.
For it to fail under those circumstances you must have run out of real and virtual memory. It might be worth checking how much virtual memory you have enabled (if any), as you should be able to run the default SeqMonk configuration with that hardware.

Quote:
Originally Posted by seqcode View Post
I am quite new to this field and am just familiaring myself with seqMonk. I am planning to run some experiment to look at global DNA methylation profile in certain animal disease model with a meDIP-seq approach. From what I understand so far, seqMonk should be suited to analyze this data, am I right?
Yes, we probably do more MeDIP analysis than any other kind of experiment, so hopefully SeqMonk should provide a useful set of tools for analysing this kind of data.
simonandrews is offline   Reply With Quote
Old 12-05-2010, 06:46 PM   #10
silin284
Member
 
Location: ny

Join Date: Jul 2009
Posts: 23
Default support for poorly annotated genomes

Hi Simon

I do not have a EMBL style genome sequence. I only have the genome sequence file (fasta) and GTF file. Is there some tool to convert them into a format that SeqMonk can use?

Cheers
sz
silin284 is offline   Reply With Quote
Old 12-05-2010, 06:54 PM   #11
silin284
Member
 
Location: ny

Join Date: Jul 2009
Posts: 23
Default support for poorly annotated genomes

Hi Simon

I do not have a EMBL style genome sequence. I only have the genome sequence file (fasta) and GTF file. Is there some tool to convert them into a format that SeqMonk can use?

Cheers
sz
silin284 is offline   Reply With Quote
Old 12-06-2010, 12:04 AM   #12
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 870
Default

Quote:
Originally Posted by silin284 View Post
I do not have a EMBL style genome sequence. I only have the genome sequence file (fasta) and GTF file. Is there some tool to convert them into a format that SeqMonk can use?
If you're working on a genome which is present in Ensembl (any of the subsections) then just let me know which one it is and I'll add it to the official repository.

If you want to process it yourself then you can adapt the BioPerl script we use for making the main repositories. The script is in the 'Scripts' directory at the top level of the SeqMonk installation. You can use the basic structure but just rip out the EnsemblAPI stuff. The basic idea is:
  • Create a sequence object representing a chromosome
  • Read in a list of features for that chromosome (from the GTF file in your case)
  • Write the object out as an EMBL file
  • Remove the sequence part to save on space (optional)
simonandrews is offline   Reply With Quote
Old 12-07-2010, 11:44 AM   #13
seqcode
Junior Member
 
Location: USA

Join Date: Dec 2010
Posts: 4
Default

Thanks again, Simon. Would you please point me to a few recent publications out of your recent MeDIP work so I can learn the methods in more detail?



Yes, we probably do more MeDIP analysis than any other kind of experiment, so hopefully SeqMonk should provide a useful set of tools for analysing this kind of data.[/QUOTE]
seqcode is offline   Reply With Quote
Old 12-07-2010, 01:08 PM   #14
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 870
Default

Quote:
Originally Posted by seqcode View Post
Thanks again, Simon. Would you please point me to a few recent publications out of your recent MeDIP work so I can learn the methods in more detail?
Maybe in a few weeks, assuming our response to reviewers goes well
simonandrews is offline   Reply With Quote
Old 12-07-2010, 01:28 PM   #15
seqcode
Junior Member
 
Location: USA

Join Date: Dec 2010
Posts: 4
Default

Look forward to that. Best of luck!

Quote:
Originally Posted by simonandrews View Post
Maybe in a few weeks, assuming our response to reviewers goes well
seqcode is offline   Reply With Quote
Old 02-08-2011, 05:59 AM   #16
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 870
Default

SeqMonk v0.14.0 has just been released. This adds a few new features and squishes some bugs.

New features include:
  • A new cumulative distribution plot which allows you to compare the whole distribution of quantitated values for several data stores or probe lists.
  • A new secondary quantitation method - the percentile normalisation quantitation allows you to take an existing set of quantitated values and normalise them to a particular point in their distribution. This would be useful in cases where the existing option to normalise to total read count does not produce an acceptable match between the distributions across your data stores.
  • The annotation readers now allow the import of multiple files in the same operation. Newly imported annotation tracks are now displayed immediately by default
  • When using the generic text import for annotation data you can now manually specify a feature type rather than having to have this in the file, or simply using the file name
  • A scale bar has been added to the genome view (following a suggestion on SeqAnswers)

Amongst the squished bugs were an SVG export corruption problem, crashes when encountering unexpected folders in the genome folder and a hang when normalising the line graph.

You can get SeqMonk from:

http://www.bioinformatics.bbsrc.ac.uk/projects/seqmonk/
simonandrews is offline   Reply With Quote
Old 03-11-2011, 06:58 AM   #17
psabelli
Junior Member
 
Location: Arizona

Join Date: Feb 2011
Posts: 2
Default

SeqMonk seems an excellent and user-friendly program and I would like to use it. However, our reference sequence is made of a collection of full-length cDNAs and not a genome. That is the reference sequence against which the mapping of our Solexa tags has been done. I am aware that one can format any custom genome in a compatible way for SeqMonk, but is it possible to use SeqMonk using custom cDNA reference sequences instead? What alternative package for analyzing and visualizing the data can be recommended in this case? Thank you.
By the way, SeqMonk is not starting on my windows XP machine. When I double-click the .bat file, the DOS windows opens for a fraction of a second and then it closes immediately. Nothings seems to be happening. I do have the latest SeqMonk v. 0.14. and Java environment installed. (SeqMonk is starting fine on my Mac though). Any ideas about what could be wrong? Thanks again.
psabelli is offline   Reply With Quote
Old 03-11-2011, 08:15 AM   #18
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 870
Default

Quote:
Originally Posted by psabelli View Post
SeqMonk seems an excellent and user-friendly program and I would like to use it. However, our reference sequence is made of a collection of full-length cDNAs and not a genome. That is the reference sequence against which the mapping of our Solexa tags has been done. I am aware that one can format any custom genome in a compatible way for SeqMonk, but is it possible to use SeqMonk using custom cDNA reference sequences instead? What alternative package for analyzing and visualizing the data can be recommended in this case? Thank you.
I'm actually in the process of working with just this kind of data! SeqMonk wasn't really designed with this in mind, but you can make a pseudo genome out of shorter contigs where you concatonate them into groups of a few thousand. It's not ideal but if you want to have a go then I'm happy to share the code I've written for my job.

Quote:
Originally Posted by psabelli View Post
By the way, SeqMonk is not starting on my windows XP machine. When I double-click the .bat file, the DOS windows opens for a fraction of a second and then it closes immediately. Nothings seems to be happening. I do have the latest SeqMonk v. 0.14. and Java environment installed. (SeqMonk is starting fine on my Mac though). Any ideas about what could be wrong?
If it doesn't start at all then it's normally one of two things;
  1. Java isn't installed properly, or the java binary isn't in your path. Open a command prompt and type 'java -version' if you get an error saying this isn't a recognised command then this is the problem
  2. You don't have enough RAM in your machine to run the default configuration. SeqMonk ships with a configuration which assumes you have 2GB RAM. If you have less than that you can still run the program for smaller datasets but you'll need to change the memory settings.

If it's neither of these things then try starting seqmonk from a command prompt (move to the seqmonk directory and just run the bat file directly from the command line). It will still fail to launch but should leave a useful error in the window which if you post it I can see what's going wrong.
simonandrews is offline   Reply With Quote
Old 03-11-2011, 09:26 AM   #19
ttnguyen
Member
 
Location: Ireland

Join Date: Mar 2010
Posts: 41
Default

Quote:
Originally Posted by simonandrews View Post
If you're working on a genome which is present in Ensembl (any of the subsections) then just let me know which one it is and I'll add it to the official repository.

If you want to process it yourself then you can adapt the BioPerl script we use for making the main repositories. The script is in the 'Scripts' directory at the top level of the SeqMonk installation. You can use the basic structure but just rip out the EnsemblAPI stuff. The basic idea is:
  • Create a sequence object representing a chromosome
  • Read in a list of features for that chromosome (from the GTF file in your case)
  • Write the object out as an EMBL file
  • Remove the sequence part to save on space (optional)
Hi Simon,
SeqMonk is awesome, but I don't code in Perl so cannot make a new genome as you guided. Are there any lucks you could add a new function in SeqMonk to allow users to build a new genome from the GTF file? I am using the human reference hg18 & hg19 downloaded from UCSC GB.
Thanks,
Nguyen
ttnguyen is offline   Reply With Quote
Old 03-11-2011, 09:47 AM   #20
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 870
Default

Quote:
Originally Posted by ttnguyen View Post
Hi Simon,
SeqMonk is awesome, but I don't code in Perl so cannot make a new genome as you guided. Are there any lucks you could add a new function in SeqMonk to allow users to build a new genome from the GTF file? I am using the human reference hg18 & hg19 downloaded from UCSC GB.
Thanks,
Nguyen
Those genomes are already present in our repositories - we just use the Ensembl rather than UCSC nomenclature. hg18=NCBI36 hg19=GRcH37

I think I'm right in saying that we can't automatically build a genome file from GTF files since they don't contain the length of the chromosome, so we can't work out how much sequence is left after the last gene finishes. (I'm happy to be corrected if this isn't true).
simonandrews is offline   Reply With Quote
Reply

Tags
analysis, desktop, seqmonk, visualization

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:11 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO