![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
ChIP-Seq: Enabling Data Analysis on High-Throughput Data in Large Data Depository Usi | Newsbot! | Literature Watch | 1 | 04-18-2018 10:50 PM |
Cufflinks - Nature Biotech data sets | adrian | Bioinformatics | 1 | 04-16-2011 05:40 PM |
public data sets | muchomaas | Bioinformatics | 2 | 06-08-2010 02:48 AM |
sff_extract: combining data from 454 Flx and Titanium data sets | agroster | Bioinformatics | 7 | 01-14-2010 11:19 AM |
SeqMonk - Flexible analysis of mapped reads | simonandrews | Bioinformatics | 7 | 07-24-2009 05:12 AM |
![]() |
|
Thread Tools |
![]() |
#141 |
Member
Location: USA Join Date: Mar 2011
Posts: 25
|
![]()
Has anybody been successful in finding DNase I footprints using SeqMonk?
Can we use SeqMonk to do this sort of analysis? |
![]() |
![]() |
![]() |
#142 |
Simon Andrews
Location: Babraham Inst, Cambridge, UK Join Date: May 2009
Posts: 871
|
![]()
We've never tried as far as I'm aware. Since it's (as I understand it) just enrichment data you should be able to use the same sort of methods as for ChIP-Seq. If there's something more specific which applies to this kind of data then if you can provide me with a pointer I can look at adding it.
I'd be interested to hear how you get on. |
![]() |
![]() |
![]() |
#143 |
Member
Location: USA Join Date: Mar 2011
Posts: 25
|
![]()
To not to cite to many fragments from this short paper I will let you read it if you are interested.
The biggest difference between ChIP and DNase, in my opinion, is that while ChIP looks at the underlying peaks of usually well known proteins, DNase looks at the enrichment of much wider regions. There are also other differences between these two methodologies (see the paper). Overall, with this kind of experiment people would be actually looking for the depleted (from sequencing tags) regions which were protected by bound proteins. I guess that could be done in SeqMonk by creating continuous probes across the genome and then looking at the depleted regions instead of the enriched. Having said that, I would probably have some problems setting some kind of cut-off level between enrichment and depletion. At the moment I'm trying to evaluate the options from the paper I mentioned but it would be nice to have SeqMonk to do that as well, as I already have done some simple analysis in it. If I will succeed to do that with SeqMonk I'll let you know. |
![]() |
![]() |
![]() |
#144 |
Member
Location: USA Join Date: Mar 2011
Posts: 25
|
![]()
So I managed to come up with similar results with SeqMonk as with other software. The one I have in mind here is F-seq, which is fairly commonly used for analysis of DNase data.
In SeqMonk I have used contig probe generator to do this. Actually, SeqMonk returned more reasonable enriched regions that F-seq but many of them overlapped. Furthermore I managed to relate my data in SeqMonk to the protein binding sites (PBS) I'm interested in. What I would like to achieve now, is to come up with some kind of systematic way of identifying the depleted regions within the enriched regions. These would correspond to the protected sites where my protein was bound to. I have attached an example of such region. What you see there is DNase Hypersensitive Sites (DNase_HS), underlying protein biding site (PBS) and at the very bottom the sinlge base-paired probes created within DNase_HS. The underlying dip within the probes would ideally correspond to the depleted region (there might be another one to the right of the first one - between 400 and 500 bp in displayed region). I have tried to use Z-score re-quantitation to see how different the probes are from the mean but that didn't yield anything informative at the moment. At the moment I can't think of anything I could use in SeqMonk to annotate probes that have significantly lower values than surrounding probes within a window (which would be what I'm essentially looking for). Is there any systematic way I could identify such short stretches of depletion in SeqMonk? Thanks in advance for any input. |
![]() |
![]() |
![]() |
#145 |
Junior Member
Location: USA Join Date: Nov 2012
Posts: 2
|
![]()
I have RNA-seq libraries made from cell lines that express a transgene, and was able to quantitate using the RPKM pipeline for all existing probes. Is it possible to design a probe for the transgene as well, if so how, and can that be done using the RPKM pipeline?
|
![]() |
![]() |
![]() |
#146 | |
Simon Andrews
Location: Babraham Inst, Cambridge, UK Join Date: May 2009
Posts: 871
|
![]() Quote:
Once you've done that then the transgene should show up the same as any other gene and it would just be a case of picking the value for that gene out of the full set of quantitated data. |
|
![]() |
![]() |
![]() |
#147 | |
Simon Andrews
Location: Babraham Inst, Cambridge, UK Join Date: May 2009
Posts: 871
|
![]() Quote:
Setting this up would need the addition of a new quantitation method, but it would basically be an adaption of the existing smoothing quantitation so it would be really easy to add. If you want to contact me off list (simon.andrews@babraham.ac.uk) I can give you a development snapshot with this in to test, and if you could let me have some example data of this type it would be really helpful in making sure it's working properly. |
|
![]() |
![]() |
![]() |
#148 |
Simon Andrews
Location: Babraham Inst, Cambridge, UK Join Date: May 2009
Posts: 871
|
![]()
I've just added a new smoothing subtraction quantitation which should do what you need to allow a better systematic analysis of the DNase data. Drop me an email and I can give you a test release containing this code so you can try it out before I put it into an official release.
|
![]() |
![]() |
![]() |
#149 |
Simon Andrews
Location: Babraham Inst, Cambridge, UK Join Date: May 2009
Posts: 871
|
![]()
I've just release SeqMonk v0.24.0. I've included the smoothing subtraction quantitation I described above which should help for DNase type analyses but there's also lots of other new stuff listed below:
Some changes have also been made to address problems in previous versions:
|
![]() |
![]() |
![]() |
#150 |
Simon Andrews
Location: Babraham Inst, Cambridge, UK Join Date: May 2009
Posts: 871
|
![]()
I've just released v0.24.1 to fix a few bugs which were discovered in the last release. The new release is up on our site now.
|
![]() |
![]() |
![]() |
#151 |
Member
Location: Little Rock AR Join Date: Jul 2010
Posts: 12
|
![]()
Is there a way to calculate the average read coverage / exon for RNA-seq datasets using SeqMonk. I imagine it would be similar to the Coverage Depth quantitation except that it gives min or max coverage. We are trying to find genes that have a meaningful RPKM value. Based on this paper (McIntyre et al, BMC Genomics, 2011, 12:293), an average of 5 reads per base (for each exon) was needed to have reliable RPKMs. Didnt know if there was a way to compute this for all exons and then filter ones with insufficient quatitation.
|
![]() |
![]() |
![]() |
#152 | |
Simon Andrews
Location: Babraham Inst, Cambridge, UK Join Date: May 2009
Posts: 871
|
![]() Quote:
It would also be pretty easy to extend the coverage depth quantitation to allow mean or median value as a valid measure so I'll add that to the next release, but I think you can do what you want with the existing tools. |
|
![]() |
![]() |
![]() |
#153 |
Member
Location: australia Join Date: Jan 2011
Posts: 81
|
![]()
I was wondering is it possible to analyze (exon level) RNA-seq data with three replicates over a time course in Seqmonk.
Thanks |
![]() |
![]() |
![]() |
#154 | |
Simon Andrews
Location: Babraham Inst, Cambridge, UK Join Date: May 2009
Posts: 871
|
![]() Quote:
For the initial quantitation you would just put probes over all exons, but probably filtering your set of transcripts to remove odd biotypes to reduce your set of exons somewhat. We normally just use protein coding mRNAs for our analysis now which about halves the number of transcripts we have to consider from the full Ensembl set. Once you have the probes you'd use a read count quantitation correcting for total count and log transforming. You would do the quantitation normalisation in the same way as for whole transcript or gene quantitation. For the differential analysis I'd suggest doing an intensity difference filter using replicate sets of all of your replicate groups. I'd then do a replicate set stats analysis using the same replicate groups and then focus on exons which were found to be significant by both of these methods. Once you have the full set of changing exons you could use the clustering tools to find sets of exons which behave similarly to make the interpretation of your analysis more simple. The problem you're always fighting against with this type of analysis is the number of tests you are performing. If you're looking at all exons in the genome you will have a huge number of probes and potentially low observation levels which will make it difficult to achieve a sufficient level of significance to survive multiple testing correction. You can't pre-filter on difference before doing the intensity difference filter since this will break the statistical background model, but if there's any other way to focus on exon which are more likely to be of interest then you could think about that. The other option you have if you want to look at alternative splicing it to analyse the splice events rather than the exon expression. When you import your data you would choose to import spliced reads and then import introns rather than exons. You could then use the read position probe generator to put a probe over each different splice event, and the exact overlap quantitation method to count the number of occurrences of each splice event in your different samples. Analysing these data will allow you to focus more explicitly on changes in splicing as opposed to the more mixed signals you might get from looking at read counts in overlapping exons. On my todo is is another tutorial which goes over the options in the program for looking at alternative splicing since this is something we're increasingly doing and which is getting more feasible with longer read lengths and higher coverage depths. |
|
![]() |
![]() |
![]() |
#155 |
Junior Member
Location: U.S. Join Date: Mar 2013
Posts: 3
|
![]()
Hi Simon Andrews,
I have really enjoyed to use SeqMonk to analyze my mRNA-seq data. It is certainly a great program. Recently, I noticed that new version of SeqMonk has been released (v0.24.1) and just simply rerun the analysis of my mRNA-seq data using the new version. I used 'Quantitation Pipelines - RNA Seq quantitation pipelines' and selected following options - mRNA / Non-strand speicif / Merge transcript isoforms / Generate Raw Counts. Although most of probes gave same counting numbers, surprisingly some probes gave different counting numbers compared to the results from old (v0.24.0). Changes in counting number seems to be consistent between samples, but it slightly changes p-value when I do intensity Difference filtering or Replicate Set filtering, giving a slightly different set of probes. FYI, I used exactly same options. I assumed that raw counts should be remained same between v0.24.0 and v0.24.1 because v0.24.0 only incorrectly corrected for total read count if I used unmerged transcripts based on your note - Yes, this is the reason I rerun my analysis using v0.24.1 just in case. I am wondering what's going on. Do you have any idea why I got different read counts for some probes (not all)? Thanks for your help in advance! |
![]() |
![]() |
![]() |
#156 |
Simon Andrews
Location: Babraham Inst, Cambridge, UK Join Date: May 2009
Posts: 871
|
![]()
The code which was changed in 0.24.1 only affected the global correction applied, so if you've been using the 'generate raw counts' option then this wouldn't have been applied so you shouldn't have got anything different between the two versions. I've just been back and checked the logs for that code and apart from deleting a commented line, the only change was in that correction code which wouldn't be executed if you were generating raw counts.
If you're seeing consistent differences for only a small number of probes is it possible that your annotation has changed between the analyses? Could you have been using a filtered set of transcripts for your analysis before, or could your annotation for this assembly have been updated? I should also point out that in your mail it sounded like you were using raw counts for the statistical testing within SeqMonk (you may not be, it wasn't entirely clear). If you're using the tests inside the program you really want to use normalised log transformed counts for the analyses. Raw counts are only intended to be used for external analyses (DESeq for example) which require them. |
![]() |
![]() |
![]() |
#157 |
Junior Member
Location: U.S. Join Date: Mar 2013
Posts: 3
|
![]()
Thanks for the reply!
I have played around what is different between my previous analysis and current one more than ver. of program. In fact, there was a difference other than ver. of program. In previous analysis, when I imported my mapped files, I selected the option 'split spliced reads'. Then when I run the pipeline, I selected 'merge transcript isoform'. In current one, I didn't check 'split spliced reads' when I imported my data. Although I choose same option for the pipeline (merge transcript isoform), it gave different results as I mentioned depending on 'split spliced reads' option in Import data step. Since I did a pretty much regular mRNA-seq mapping, not a spliced mapping, I guess I shouldn't check 'split spliced reads'. Is it correct? In addition, you are right. I was not clear before but I actually used normalized log transformed counts for the statistical analyses in SeqMonk. Main reason I also want to get raw counts is for external analyses as you pointed out. Thank you again for your wonderful help! |
![]() |
![]() |
![]() |
#158 |
Simon Andrews
Location: Babraham Inst, Cambridge, UK Join Date: May 2009
Posts: 871
|
![]()
If you've not done a spliced mapping then the 'split spliced reads' option won't do an awful lot to your data. The one change it will force is to split paired end reads into individual reads rather than generating a single read which covers the pair. It will fix the strand attributes to be correct for directional paired end libraries.
For any RNA-Seq data we'd strongly recommend using the split reads option, even if you didn't use a spliced mapper since it will do the right thing with your data. In particular, the RNA-Seq quantitation needs to use individually mapped reads rather than joined read pairs. The reason for this is that although you want to count reads, if you've done a spliced import some reads will have been split up, and would therefore count double if simple read counts were used. The program therefore counts bases of overlap and then works out the read length for the sample and then uses this to convert the base counts back to read counts. If you've used joined read pairs then you'll get huge (and incorrect) base counts, and you'll get a predicted read length which is also huge (and incorrect). |
![]() |
![]() |
![]() |
#159 |
Junior Member
Location: U.S. Join Date: Mar 2013
Posts: 3
|
![]()
[QUOTE=simonandrews;98404]For any RNA-Seq data we'd strongly recommend using the split reads option, even if you didn't use a spliced mapper since it will do the right thing with your data.
Thanks! I got it. So I should use 'split spliced reads' option in data import step almost all the time for RNA-seq data. Then I guess my first analysis (using split spliced reads option) is correct one. I may be confused since I read 'Spliced import. If you have done a spliced mapping you will be offered the option to split spliced reads into their component parts' in the SeqMonk manual. I thought, I can choose that option if I have done a spliced mapping. But certainly, I found that in your youtube tutorial that you selected 'split spliced reads' option for RNA-seq analysis when you import your data! I really appreciate your clarification. Thanks! |
![]() |
![]() |
![]() |
#160 |
Member
Location: New York Join Date: Jan 2009
Posts: 23
|
![]()
I've recently installed SeqMonk - it really looks great, and should be very useful for some of the analyses I'm interested in running. Many thanks to the developers!
However, I've run into a rather strange bug. Looking at the menubar, the "File" "Edit" and "View" pulldowns all work just fine. However, all other menus, such as "Data" "Plots" "Filtering" and so on, can be pulled down, but the options are not accessible - they are all completely greyed out. Nothing seems to change this. I can import large BAM files just fine, visualize reads just fine, and so forth - just cannot access menu items. I've run into the same problem with trying the sample data set from the Seqmonk website. Searching the web, I haven't encountered any reports of similar behavior - can anyone comment on this? I feel like i must be missing something obvious. I'm running SeqMonk on Mac OS X 10.8.2, 32GB RAM (10GB available to SeqMonk), with the more recent version of Java. Thanks for the help. |
![]() |
![]() |
![]() |
Tags |
analysis, desktop, seqmonk, visualization |
Thread Tools | |
|
|