Seqanswers Leaderboard Ad

**simonandrews** · 09-09-2013, 12:54 AM

I've just pushed out a new release of SeqMonk (v0.25.0). This has been nearly ready to go for ages now and has loads of new stuff in it. You can see the full list of additions in the release notes, but some of the big changes are:

Adding a quantitation trend plot to look at how any quantitated data changes around a set of features
Adde a multi-sample chi-square for application such as allele specific expression
Allow multiple samples in the aligned probes plot and added custom sorting
The abilty to filter raw reads against features when re-importing
Adding a domainogram plot to look at quantitations over different window sizes
Added ways to find sets of featutres from a list of names
Added a nice report to completely describe how you came to a set of filtered probes
Improved normalisation options

We've also done some profiling of the seqmonk code so it should (hopefully) be noticeably quicker than the last version.

We've also had to make a change to the file format for seqmonk (to allow for comments to be added to probe lists), so projects saved with this version will not be able to be opened in older versions. This version will open older projects just fine though.

Please have a play with the new version and report any problems in our bugzilla, or by email to me or directly to this thread.

**simonandrews** · 09-09-2013, 06:05 AM

Originally posted by Aspadia View Post

Seqmonk sounds really awesome but I do not manage to run it

It is downloaded and when I try to run it it says it cannot find java. When I type java -version in cmd it says 'java' is not recognized as an internal or external command, operable program or batch file.

Since this ended up being a fairly common problem I've written up a blog post which describes why this happens and how to fix it. I'm going to look at other ways we might be able to get SeqMonk to use the installed 32-bit version of java which is normally there - but I'm somewhat reluctant to do this since SeqMonk really benefits from using the correct 64-bit version.

**crazyhottommy** · 09-09-2013, 08:49 AM

Hi Simon,

Sorry to bug you again. I am wondering what clustering algorithm is used for the aligned probe plot?
I wanted to reproduce the figure generated by Seqmonk by myself using homer + R. I got the count matrix for a ChIP-seq data by Homer, and then imported to R, log2 transformed and then plot by heatmap.2. I can use either hierarchical or K means clustering to cluster the data.

The thing is that I can observe a more obvious peak from figure generated by Seqmonk ( one can adjust the contrast by sliding the bar on the right) The one I generated by R is somewhat not that obvious. Or could you please give any tricks on plotting this kind of data?

Many Thanks!

**crazyhottommy** · 09-09-2013, 10:16 AM

Hi Simon,

I just installed the newest version of seqmonk, it has many improved features! Thanks. I noticed that for many plots ( probe trend, box whisker etc ), it allows to specify multiple probe lists. I am wondering how you can keep several probe sets at the same time? each time, I create a new probe list, the old one would be wiped away.

Thank you again.

Tommy

**simonandrews** · 09-09-2013, 11:13 AM

Originally posted by crazyhottommy View Post

Hi Simon,

Sorry to bug you again. I am wondering what clustering algorithm is used for the aligned probe plot?

The aligned probes plot is simply ordered by the number of reads in the probe so the highest coverage goes at the top. In the new version you are now able to view multiple plots at the same time and you can choose to order them either independently, or to pick one and then order the rest by the coverage in that reference dataset.

In terms of the strength of the effects shown, there's nothing too clever about what SeqMonk is doing, it's default scaling is linear, and you'll see quite different effects on a log scale. From my own experience it's worth playing around with the amount of context you put around your regions of interest, since keeping the regions too tight may not give you enough context to be able to judge the strength of the enrichment. Also, being able to play with the contrast manually to get it set just right for what you want to show can be a big plus.

**simonandrews** · 09-09-2013, 11:17 AM

Originally posted by crazyhottommy View Post

Hi Simon,

I just installed the newest version of seqmonk, it has many improved features! Thanks. I noticed that for many plots ( probe trend, box whisker etc ), it allows to specify multiple probe lists. I am wondering how you can keep several probe sets at the same time? each time, I create a new probe list, the old one would be wiped away.

The plots can show many probe lists, not many probe sets. That is to say that if you've filtered your full probe set several different ways then you can plot these subsets together, but they're all part of the same original probe set.

There isn't a way to have more than one probe set active at once. Lots of things about the way SeqMonk expects to be able to work don't scale to having more than one probe set so this isn't something we're likely to add.

Although you can't keep a previous probe set around if you choose to create a new one, you do have the option of turning any probe list into an annotation track. This won't preserve the quantitated values, but it will preserve the positions which can often be useful. You can do this by selecting File > Import Annotation > Active Probe List.

**crazyhottommy** · 09-09-2013, 06:19 PM

Originally posted by simonandrews View Post

The aligned probes plot is simply ordered by the number of reads in the probe so the highest coverage goes at the top. In the new version you are now able to view multiple plots at the same time and you can choose to order them either independently, or to pick one and then order the rest by the coverage in that reference dataset.

In terms of the strength of the effects shown, there's nothing too clever about what SeqMonk is doing, it's default scaling is linear, and you'll see quite different effects on a log scale. From my own experience it's worth playing around with the amount of context you put around your regions of interest, since keeping the regions too tight may not give you enough context to be able to judge the strength of the enrichment. Also, being able to play with the contrast manually to get it set just right for what you want to show can be a big plus.

So if I want to compare ChIP-seq enrichment between two sets of probes, when I adjust the contrast, I need to apply the adjustment at the same time for both heatmaps. It is something like Western blot ( a wet lab technique), you should expose for the same time for your treatment and control. For the context, I've seen people using -3kb to 3kb, I also saw people using -8kb to 8kb, not sure what is the consensus though...

**crazyhottommy** · 09-09-2013, 06:20 PM

Originally posted by simonandrews View Post

The plots can show many probe lists, not many probe sets. That is to say that if you've filtered your full probe set several different ways then you can plot these subsets together, but they're all part of the same original probe set.

There isn't a way to have more than one probe set active at once. Lots of things about the way SeqMonk expects to be able to work don't scale to having more than one probe set so this isn't something we're likely to add.

Although you can't keep a previous probe set around if you choose to create a new one, you do have the option of turning any probe list into an annotation track. This won't preserve the quantitated values, but it will preserve the positions which can often be useful. You can do this by selecting File > Import Annotation > Active Probe List.

Thanks for your clarification!

**simonandrews** · 09-09-2013, 11:24 PM

Originally posted by crazyhottommy View Post

So if I want to compare ChIP-seq enrichment between two sets of probes, when I adjust the contrast, I need to apply the adjustment at the same time for both heatmaps. It is something like Western blot ( a wet lab technique), you should expose for the same time for your treatment and control. For the context, I've seen people using -3kb to 3kb, I also saw people using -8kb to 8kb, not sure what is the consensus though...

There shouldn't really be a consensus as the size you use will depend on the nature of the enrichment you're looking at and the insert size of your library among other factors.

When using multiple probe lists (not sets) in SeqMonk you now draw all of the plots in a single window and the slider adjusts all of them simultaneously so they're directly comparable. I'm never really sure how valuable it is to compare the strength of enrichment in these plots since this can be affected by technical artefacts, but it's a really good way to show differences in the patterning or extent (proportion of probes) of the enrichment.

**VincentC** · 09-10-2013, 02:22 AM

Hi everyone,

We performed bisulfite treatment on 2 conditions x 3 genomes followed by deep sequencing (paired-ends, 2x100bp, Illumina HiSeq 2000). We used Bismark for read alignment and methylation calling.

I am now struggling to visualize my data with seqmonk and make it fit to Methylkit data that has been generated by a collaborator. We pooled the 3 genomes for each condition, comparing therefore two data sets namely A and B.

Here is the procedure I follow, according to the seqmonk guide, videos and other resources:
- I generate probes using contig probe generator: I select both datasets A and B, min contig size = 1 and by default for the remaining options.
- After that I quantify using the bisulfite pipeline over features: I select existing probes as features, and leave all other options as default.
- I then filter my data on values (individual probes), must be between “0” and rest by default.

First, is this procedure correct, or should I proceed differently given my data sets? Also, what is the best way to statistically filter my data? Thanks a lot for the advices, I’m learning the hard way!!

**crazyhottommy** · 09-10-2013, 06:04 AM

Originally posted by simonandrews View Post

There shouldn't really be a consensus as the size you use will depend on the nature of the enrichment you're looking at and the insert size of your library among other factors.

When using multiple probe lists (not sets) in SeqMonk you now draw all of the plots in a single window and the slider adjusts all of them simultaneously so they're directly comparable. I'm never really sure how valuable it is to compare the strength of enrichment in these plots since this can be affected by technical artefacts, but it's a really good way to show differences in the patterning or extent (proportion of probes) of the enrichment.

Do you mean that for two different probe lists, it is hard to compare the enrichment of certain marks?
Let's say, I have two lists of promoter regions ( one list contains the active promoter, the other contains the inactive promoter based on the RNA-seq data).

One may expect H3k4me3 enriches at active promoters, but not the inactive promoters.

DO you mean the aligned probe plot can only look at the pattern, but can not compare the signal strength ( the colour strength in the plot)?

I agree that the Aligned probe plot gives the most information about the data set. The probe trend plot is also very good, but it only gives an average point of view. I saw many papers (only) use box plot to show the tag intensity to compare treatment and control. And it hides a lot of information. Ideally, one should show the trend plot and aligned probe plot at the same time. In this way, readers have an idea whether the mark is enriched and what's the proportion of the probes are enriched with this mark ( TFs, or histone modification).

Thanks!

**Mokinhas** · 09-13-2013, 01:46 AM

Hi Simon,

I am really fan of SeqMonk!! It is great!
However I am quite new on this bioinformatic analysis and I have a little question. I am analysisng RNA seq data and I follow the youtube video (very usefull for starters btw) but I do not get in the report what the differential expression means. How can I get a normal fold change? Is that possible?

Thanks in advance.

**simonandrews** · 09-13-2013, 01:54 AM

Originally posted by Mokinhas View Post

Hi Simon,

I am really fan of SeqMonk!! It is great!
However I am quite new on this bioinformatic analysis and I have a little question. I am analysisng RNA seq data and I follow the youtube video (very usefull for starters btw) but I do not get in the report what the differential expression means. How can I get a normal fold change? Is that possible?

Thanks in advance.

If you've taken the defaults for the RNA-Seq quantitation then the values recorded for each sample will be log2 RPM (reads per million reads of library). The reports will simply show the quantitated value rather than differences since the quantitation works the same if you have 1, 2 or 100 samples.

If you want to get a fold change from the quantitated values then it's a simple calculation from the log2RPM values. The fold change will just be 2 to the power of the difference in log2RPM, so if you had a value of 3 in one dataset and 5.5 in the other then the difference would be 2.5 and the fold change would be 2^2.5 = 5.7 fold.

If you want to have the differences included in the report then you can do a value differences filter on your data. This will record the difference value against the list so it will show up in the report and you won't have to calculate it afterwards (it will be log2RPM difference though, not fold change).

**Mokinhas** · 09-17-2013, 11:15 PM

Thanks for your quick reply Simon. I understand now

**Neuromancer** · 09-26-2013, 06:10 AM

Hi Simon,

Just a short question about genome versions:
As far as I know, SeqMonk genomes are derived from ENSEMBL genome releases, right?
So is the current SeqMonk mouse genome (GRCm38) the same as the annotation and coordinates in ENSEMBLE release 73 (i.e. GRCm38p1 + new annotations by ENSEMBL)?

[This current release has 38561 genes (ensemble gene IDs), SeqMonk's probe generator (v0.25.0) generates 32029 genes (feature probes over genes, nothing removed)...]

What's the status of the SeqMonk (mouse) genome then?

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News