Seqanswers Leaderboard Ad

**crazyhottommy** · 07-02-2013, 07:22 AM

Originally posted by simonandrews View Post

If you give the two sets of results the same feature name ('peak' for example) then they'll be merged together in the same track and you can use the feature probe generator to make probes from the combined set. You can always separate them later by renaming one of the tracks using the controls in the Annotation Sets folder of the data view.

Wow, prompt answer. Thank you.

another question would be the normalization for the ChIP-seq counts. Seqmonk provides two ways to normalize them.
1. log2 CPM
2. largest data set

my understanding is that normalizing to the largest data set is essentially the same as
log2 CPM but scale up with some factor.

the problem with log2 CPM normalization is that you got some negative values.
Does it still make sense if I use result from the log2 CPM to do the downstream analysis like scatter plot and hierachical cluster? Thanks!

**simonandrews** · 07-02-2013, 07:33 AM

Originally posted by crazyhottommy View Post

Wow, prompt answer. Thank you.

another question would be the normalization for the ChIP-seq counts. Seqmonk provides two ways to normalize them.
1. log2 CPM
2. largest data set

my understanding is that normalizing to the largest data set is essentially the same as
log2 CPM but scale up with some factor.

the problem with log2 CPM normalization is that you got some negative values.
Does it still make sense if I use result from the log2 CPM to do the downstream analysis like scatter plot and hierachical cluster? Thanks!

It's fine to use either value for subsequent analysis. The largest data store option is there so you don't get fractional (or negative if you're on a log scale) read counts, but there's nothing intrinsically wrong with these, it's just that they can look a bit odd on any plot where 0 is supposed to mean something. The magnitude of differences will be exactly the same in either case.

When you're doing a hierarchical cluster you normally do per-probe normalisation anyway, in which case the plot you get will be exactly the same whichever of the earlier normalisation options you selected.

**crazyhottommy** · 07-02-2013, 07:36 AM

Originally posted by simonandrews View Post

It's fine to use either value for subsequent analysis. The largest data store option is there so you don't get fractional (or negative if you're on a log scale) read counts, but there's nothing intrinsically wrong with these, it's just that they can look a bit odd on any plot where 0 is supposed to mean something. The magnitude of differences will be exactly the same in either case.

When you're doing a hierarchical cluster you normally do per-probe normalisation anyway, in which case the plot you get will be exactly the same whichever of the earlier normalisation options you selected.

What does per-probe normalization mean?

**simonandrews** · 07-02-2013, 08:17 AM

Originally posted by crazyhottommy View Post

What does per-probe normalization mean?

Per probe normalisation simply adjusts your values by subtracting the median values for that probe across all of your samples from each of the individual probe values. In effect what it shows you is whether the value you have for that probe is higher or lower in this sample than the other samples you're looking at.

Taking some real numbers if you have a probe which has values of 10,12,8 and 10 across 4 samples and you did per-probe normalisation you'd find the median was 10, so the normalised values would be 0,2,-2 and 0. This makes it a lot easier to compare to another data set which might have values of 20,22,18 and 20 for example.

**crazyhottommy** · 07-02-2013, 08:43 AM

Originally posted by simonandrews View Post

Per probe normalisation simply adjusts your values by subtracting the median values for that probe across all of your samples from each of the individual probe values. In effect what it shows you is whether the value you have for that probe is higher or lower in this sample than the other samples you're looking at.

Taking some real numbers if you have a probe which has values of 10,12,8 and 10 across 4 samples and you did per-probe normalisation you'd find the median was 10, so the normalised values would be 0,2,-2 and 0. This makes it a lot easier to compare to another data set which might have values of 20,22,18 and 20 for example.

I did not quite get it.

I think you mean to say:

10, 12, 8, 10 are values for 4 probes in the same sample ( or in my case TF1)

20,22,18,20 are values for the same 4 probes in my other sample ( the other TF2)

per-probe normalization is done to compare the value for TF1 and TF2 for the same probe.

I am a little bit confused...

**simonandrews** · 07-02-2013, 11:39 AM

Originally posted by crazyhottommy View Post

I did not quite get it.

I think you mean to say:

10, 12, 8, 10 are values for 4 probes in the same sample ( or in my case TF1)

20,22,18,20 are values for the same 4 probes in my other sample ( the other TF2)

No, the other way around.

10,12,8,10 are the values for 1 probe across 4 different samples.

20,22,18,20 are the values for a different probe against the same samples.

Per-probe normalisation is a way to be able to compare the relative changes in each probe and removes the effect of the absolute values it has. In this case you'd be interested that sample 2 had the highest value and 3 the lowest even though the absolute values of the probes are quite different.

**Aspadia** · 08-12-2013, 07:38 AM

Hi Simon,

Seqmonk sounds really awesome but I do not manage to run it

It is downloaded and when I try to run it it says it cannot find java. When I type java -version in cmd it says 'java' is not recognized as an internal or external command, operable program or batch file.

In previous reply you mentioned this:
Java isn't installed properly, or the java binary isn't in your path. Open a command prompt and type 'java -version' if you get an error saying this isn't a recognised command then this is the problem.
So I guess the second part is my problem but how can I solve this? I just installed the latest version of Java (Version 7 Update 25).

Hope you can help me, thanks in advance!

**mathew** · 08-12-2013, 08:11 AM

Seqmonk and Java

Simon,

The Java and seqmonk problem is now creating a real headache. I am also in same boat. Perhaps at some point in time Seqmonk and Java dependency need to be thoroughly investigated. I agree it may be because of Java updates but we want Seqmonk to work.

Thanks

**simonandrews** · 08-12-2013, 09:10 AM

If java is not found on your command line the most likely cause we've seen is that you have installed the 32 bit version of java on a 64 bit machine. Oracle only show the 32 bit version of java on the front page of java.com, to get the 64 bit version you need to click on "See all java downloads" and then get the 64 bit version.

You can install 32 and 64 bit side by side so adding the 64 bit version won't mess up your browser, and you'll need the 64 bit version to be able to use more than 2GB memory.

Hope this helps

**markf** · 08-26-2013, 07:12 AM

Data import

Dear Simon,

I have a question about seqmonk: Is it possible to create my own probesets & quantifications outside of seqmonk, and then load them? For example from GFF, and assign the GFF score as the quantification score for that probe.

I have my own set of peak calls (from a 4C experiment, with my own preprocessing, and using R3Cseq) - and I was hoping to use seqmonk for some sanity checking, visualization and report generation.

Thanks.
Mark

PS - keep up the good work - seqmonk is great!

**simonandrews** · 08-26-2013, 07:31 AM

Originally posted by markf View Post

Dear Simon,

I have a question about seqmonk: Is it possible to create my own probesets & quantifications outside of seqmonk, and then load them? For example from GFF, and assign the GFF score as the quantification score for that probe.

Yes, and no. It's pretty easy to import external probe positions, for example from a peak caller, and we do this all the time. The best way to do this (and soon the only way to do it) is to import the peak positions as an annotation track using whichever type of annotation importer makes sense (text, GFF etc), and then use the feature probe generator to place probes over the features in that track.

What you would then do is to import the data for that ChIP and then use the standard quantitation tools to assign quantitated values to the peaks you imported. What you can't do is to import pre-quantitated data - we have thought about this a lot since we've had a fair few requests for it, but it causes so many problems in other parts of the data model that we've decided against this. There are other programs around which deal solely with pre-quantitated data (IGV etc) and the point about SeqMonk is that it has access to the raw data so you can do things which those programs can't do.

Originally posted by markf View Post

PS - keep up the good work - seqmonk is great!

Cool, thanks! There should be a new release coming out very soon now which has loads of new stuff which you'll hopefully find useful.

**Aspadia** · 09-03-2013, 05:56 AM

Relative quantitation

Hi Simon,

Thanks for your previous quick reply, I got seqmonk running now

This brings a next problem. I want to use the 'relative quantitation' option, I create probes in an IP and input sample from ChIP then apply the running window probe generator and would then like to use the relative quantitation. However, after the probe generation it does not show me the option for relative quantitation nor when I go to Data --> quantitate existing probes....
Can you tell me how I can get to this relative quantitation?

Thanks a lot!

**simonandrews** · 09-03-2013, 07:22 AM

Originally posted by Aspadia View Post

I want to use the 'relative quantitation' option, I create probes in an IP and input sample from ChIP then apply the running window probe generator and would then like to use the relative quantitation. However, after the probe generation it does not show me the option for relative quantitation nor when I go to Data --> quantitate existing probes....
Can you tell me how I can get to this relative quantitation?

Relative quantitation is a normalisation method for an existing quantitation, so you'd start by doing a normal read count or base pair count normalisation. If you then go back to the quantitation options you'll see a new set of methods appear in red which can be used to alter the existing quantitation - you could then use the relative quantitation to select a reference and either subtract, divide or log divide the other samples by the values in this dataset.

**mathew** · 09-08-2013, 09:18 PM

I am curious if Seqmonk can give me coverage of specific genes/ transcripts from DNA seq data as is given by coverageBed in bedtools http://bedtools.readthedocs.org/en/latest/
I want to calculate coverage per gene/ chromosomes
Thanks

**simonandrews** · 09-08-2013, 10:48 PM

Originally posted by mathew View Post

I am curious if Seqmonk can give me coverage of specific genes/ transcripts from DNA seq data as is given by coverageBed in bedtools http://bedtools.readthedocs.org/en/latest/
I want to calculate coverage per gene/ chromosomes
Thanks

Yes - depending on what you want there are a couple of ways you can do this. For standard RNA-Seq type quantitation you can use the RNA-Seq pipeline. This can quantitate at the gene (combined exons) or transcript level and can correct for transcript length. It works on simple overlaps so doesn't do re-partitioning between ambiguous splice variants.

For other non-spliced features you can use the normal mechanism of placing reads using the feature probe generator and then counting reads using the read count quantitation (or whichever other quantitaiton you want). You can place reads in all sorts of other ways too if you want to count whole chromosomes or other regions.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 23 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 21 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News