Seqanswers Leaderboard Ad

**simonandrews** · 01-08-2013, 12:51 AM

Originally posted by mjp View Post

Sorry for this small delay and for not being specific enough. I see that the title of my post was wrong.
I wanted to have a list of probes that have just strand specific reads covering them. So the list would contain probes that have only fwd, only rvrs, none, and both type of reads covering them. Ideally I would like to have such a list for each of my stores independently. Is it possible to create that in single SeqMonk pipeline or does it have to be repeated for each store separately?

I don't think you can avoid having to do some filtering per-store since I think I understand that you want to end up with a separate set of lists for each store? You'll also need to do two quantitations. I reckon the quickest way to get these lists would be:

1) Quantitate using the difference quantitation using the option to quantitate forward reads as percentage of all reads.

2) Use the values filter to select probes with a value of 0 or 100 for each store. This should be pretty quick since you can set the parameters and then just select each store in turn in the filter and re-run it without having to reopen the dialog.

3) Requantitate the data with a simple read count quantitation (no corrections or transformations).

4) Use a values filter to select probes with some reads in them for each store.

5) Use the combine probes filter to select the subset of the 0% results which actually have some data in them. Again you can do this in a single filter session by changing which lists you're using so it shouldn't be too horrible.

Does this sound OK?

**mjp** · 01-08-2013, 02:48 AM

That does sound OK indeed. Thanks!

As another alternative I could do simple read count for all probes and create an annotated probe report #1 (not annotating with anything) for all the stores which would give me a list of probes with '0's for probes not having any reads over them.

Do the difference quantitation of forward as percentage of all, which would give me the list probes with '100' for the probes with only forward reads across all stores. Probe report #2.

Same for the reverse. Probe report #3.

Having these three reports it would be easy to parse it outside of SeqMonk.
If probe in #1 = 0 => no reads.
If probe in #1 > 0 and probe in #2 = 100 => then only forward
If probe in #1 > 0 and probe in #3 = 100 => then only reverse.
If probe in #1 > 0 and probes in #2 and #3 different that 0 and 100 => both reads

I think this way I will get what I need the fastest for all stores.

One way or another, your input about difference quantitation was invaluable.

Thanks again.

**shadow19c** · 01-11-2013, 02:42 AM

Hello,
I want to know how can I vizualise teh bedgrap file from bismark after methylation call?

Thanks

**simonandrews** · 01-11-2013, 02:55 AM

Originally posted by shadow19c View Post

Hello,
I want to know how can I vizualise the bedgraph file from bismark after methylation call?

SeqMonk is designed to to the quantitation of your data within the program rather than taking in externally quantitated files. Rather than trying to load the BedGraph file from Bismark you'd instead import the raw data from the methylation extractor and then quantitate this however you wanted inside SeqMonk to be able to visualise the methylation levels.

I put up a tutorial video covering some of the basics for working with bisulphite data on our youtube channel which should give you an idea how to get started with this.

**glados** · 01-22-2013, 04:34 AM

Dear Simon.

I sent you a private message a few weeks ago. Perhaps you can take a look at it? It was a question regarding installing a custom genome.

**mjp** · 01-27-2013, 03:50 AM

DNase I footprints with SeqMonk

Has anybody been successful in finding DNase I footprints using SeqMonk?

Can we use SeqMonk to do this sort of analysis?

**simonandrews** · 01-27-2013, 03:58 AM

We've never tried as far as I'm aware. Since it's (as I understand it) just enrichment data you should be able to use the same sort of methods as for ChIP-Seq. If there's something more specific which applies to this kind of data then if you can provide me with a pointer I can look at adding it.

I'd be interested to hear how you get on.

**mjp** · 01-27-2013, 05:05 AM

DNase I footprints with SeqMonk

To not to cite to many fragments from this short paper I will let you read it if you are interested.

The biggest difference between ChIP and DNase, in my opinion, is that while ChIP looks at the underlying peaks of usually well known proteins, DNase looks at the enrichment of much wider regions. There are also other differences between these two methodologies (see the paper).
Overall, with this kind of experiment people would be actually looking for the depleted (from sequencing tags) regions which were protected by bound proteins.

I guess that could be done in SeqMonk by creating continuous probes across the genome and then looking at the depleted regions instead of the enriched. Having said that, I would probably have some problems setting some kind of cut-off level between enrichment and depletion.

At the moment I'm trying to evaluate the options from the paper I mentioned but it would be nice to have SeqMonk to do that as well, as I already have done some simple analysis in it. If I will succeed to do that with SeqMonk I'll let you know.

**mjp** · 02-06-2013, 04:39 AM

DNase I footprints with SeqMonk

So I managed to come up with similar results with SeqMonk as with other software. The one I have in mind here is F-seq, which is fairly commonly used for analysis of DNase data.
In SeqMonk I have used contig probe generator to do this. Actually, SeqMonk returned more reasonable enriched regions that F-seq but many of them overlapped.

Furthermore I managed to relate my data in SeqMonk to the protein binding sites (PBS) I'm interested in.

What I would like to achieve now, is to come up with some kind of systematic way of identifying the depleted regions within the enriched regions. These would correspond to the protected sites where my protein was bound to. I have attached an example of such region.

What you see there is DNase Hypersensitive Sites (DNase_HS), underlying protein biding site (PBS) and at the very bottom the sinlge base-paired probes created within DNase_HS. The underlying dip within the probes would ideally correspond to the depleted region (there might be another one to the right of the first one - between 400 and 500 bp in displayed region).

I have tried to use Z-score re-quantitation to see how different the probes are from the mean but that didn't yield anything informative at the moment.

At the moment I can't think of anything I could use in SeqMonk to annotate probes that have significantly lower values than surrounding probes within a window (which would be what I'm essentially looking for).

Is there any systematic way I could identify such short stretches of depletion in SeqMonk?

Thanks in advance for any input.

Attached Files

example_of_footprint.png (11.7 KB, 14 views)

**sschmidt** · 02-06-2013, 01:26 PM

I have RNA-seq libraries made from cell lines that express a transgene, and was able to quantitate using the RPKM pipeline for all existing probes. Is it possible to design a probe for the transgene as well, if so how, and can that be done using the RPKM pipeline?

**simonandrews** · 02-07-2013, 12:44 AM

Originally posted by sschmidt View Post

I have RNA-seq libraries made from cell lines that express a transgene, and was able to quantitate using the RPKM pipeline for all existing probes. Is it possible to design a probe for the transgene as well, if so how, and can that be done using the RPKM pipeline?

So I'm assuming that you're talking about a novel gene inserted into the main genome which you're also measuring in your data. If that's right then you'll need to find some way to represent the transgene in your genome. You could do this by modifying the existing genome sequence and inserting the novel sequence at the correct position, or you could take a shortcut and make a short extra fake chromosome which just contained the transgene sequence. You'd need to do this for the mapping stage as well as the downstream analysis since otherwise the hits to the transgene won't be mapped. The process for adding a new fake chromosome would be the same as for making a custom genome except that you'd just add the new dat file to the chromosomes in an existing assembly rather than making a new one.

Once you've done that then the transgene should show up the same as any other gene and it would just be a case of picking the value for that gene out of the full set of quantitated data.

**simonandrews** · 02-07-2013, 12:53 AM

Originally posted by mjp View Post

What I would like to achieve now, is to come up with some kind of systematic way of identifying the depleted regions within the enriched regions. These would correspond to the protected sites where my protein was bound to. I have attached an example of such region.

What you see there is DNase Hypersensitive Sites (DNase_HS), underlying protein biding site (PBS) and at the very bottom the sinlge base-paired probes created within DNase_HS. The underlying dip within the probes would ideally correspond to the depleted region (there might be another one to the right of the first one - between 400 and 500 bp in displayed region).

I have tried to use Z-score re-quantitation to see how different the probes are from the mean but that didn't yield anything informative at the moment.

At the moment I can't think of anything I could use in SeqMonk to annotate probes that have significantly lower values than surrounding probes within a window (which would be what I'm essentially looking for).

Is there any systematic way I could identify such short stretches of depletion in SeqMonk?

Thanks in advance for any input.

Looking at the result you have I think what you'd need would be a new quantitation normalisation method which would do a local subtraction of a smoothed value running through the data. This would remove the large scale effects of the enrichment and leave you with a measure of the local difference to the general enrichment level of the area. Once you had this you could then use the windowed replicate filter to find regions which had a value which was consistently different from 0 over whatever window size you chose. This would then find sets of adjacent depleted probes which would hopefully correspond to your binding sites.

Setting this up would need the addition of a new quantitation method, but it would basically be an adaption of the existing smoothing quantitation so it would be really easy to add. If you want to contact me off list ([email protected]) I can give you a development snapshot with this in to test, and if you could let me have some example data of this type it would be really helpful in making sure it's working properly.

**simonandrews** · 02-08-2013, 02:54 AM

I've just added a new smoothing subtraction quantitation which should do what you need to allow a better systematic analysis of the DNase data. Drop me an email and I can give you a test release containing this code so you can try it out before I put it into an official release.

**simonandrews** · 02-11-2013, 09:21 AM

I've just release SeqMonk v0.24.0. I've included the smoothing subtraction quantitation I described above which should help for DNase type analyses but there's also lots of other new stuff listed below:

Added the ability to export all probe reports in GFF format
Added a pipeline to detect antisense transcription from directional RNA-Seq libraries.
Added a system which can provide immediate feedback to submitted crash reports if they're ones we've seen before and for which we can offer useful feedback.
Added a chi-square based contingency test filter which is useful for bisulphite sequencing libraries (and possibly others too).
Added an ID field to reports for cases where the name of a feature isn't useful or unique
Added a probe length quantitation option
Added a probe name filter which allows you to specify a large list of names and selects probes which match any of them
Added an option to merge all transcripts in the RNA-Seq pipeline to create a single gene level measure of transcription
Changed the active store parser to a visible stores parse to allow the easy re-import of multiple datasets in a single operation
Added an option to generate raw counts to the RNA-Seq quantitation pipeline to allow for easy interfacing with tools such as DESeq which require this
Added a smoothing subtraction quantitation method which can be used to detect sudden local changes in quantitation
Added the ability to select the order of highlighted probe lists in the scatterplot

Some changes have also been made to address problems in previous versions:

We fixed a bug which would produce incorrect p-values following multiple testing correction, but only affected p-values which were initially very high (p>~0.3)
We fixed an unnecessary level of multiple testing correction in the intensity difference filter which meant that some candidates which could have been reported were not. Typically we see around a 10% increase in the number of candidates in the new correction method over the previous version.
We changed the behaviour of the BAM import filter for paired end data which were mapped with a spliced read mapper. We now show the second read of the pair with the same direction as the first read to indicate the direction of the fragment and preserve the direction in strand specific libraries.
The "load probes from file" probe generator has been removed. It was never very well supported and its functionality is better performed by importing the data into an annotation track and using the feature probe generator.
A couple of timing bugs were fixed which prevented the import of extra annotation on some linux installations.
In HiC analysis we have removed some optimisations in the testing which were leading to unrealistically low p-values for some interactions. We now test against the full set of possible interactions, only making an exception to correct for only cis interactions when all trans interactions have been specifically excluded.

**simonandrews** · 02-22-2013, 08:50 AM

I've just released v0.24.1 to fix a few bugs which were discovered in the last release. The new release is up on our site now.

The RNA-Seq quantitation incorrectly corrected for total read count if you used unmerged transcripts (it corrected against total base count instead of total read count). Counts from versions before 0.24.0 are not affected.
The MACS probe generator didn't actually read the user options but always used the defaults. This has always been broken but we didn't notice!
A crash was fixed in the visible store parser which was introduced in the last release.
A crash was fixed if you ran the intensity difference filter with a very small number of probes
A occasional drawing bug on the axis labels of the probe trend plot was fixed. I think this one wins a record for being the longest standing bug in the code. It's been there since the trend plot was first introduced in v0.6 and we've finally found and fixed it.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 18 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 17 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 48 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News