![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
ChIP-Seq: Enabling Data Analysis on High-Throughput Data in Large Data Depository Usi | Newsbot! | Literature Watch | 1 | 04-18-2018 10:50 PM |
Cufflinks - Nature Biotech data sets | adrian | Bioinformatics | 1 | 04-16-2011 05:40 PM |
public data sets | muchomaas | Bioinformatics | 2 | 06-08-2010 02:48 AM |
sff_extract: combining data from 454 Flx and Titanium data sets | agroster | Bioinformatics | 7 | 01-14-2010 11:19 AM |
SeqMonk - Flexible analysis of mapped reads | simonandrews | Bioinformatics | 7 | 07-24-2009 05:12 AM |
![]() |
|
Thread Tools |
![]() |
#121 | |
Simon Andrews
Location: Babraham Inst, Cambridge, UK Join Date: May 2009
Posts: 871
|
![]() Quote:
If you import as single end then both ends of the pair will be shown as separate reads but there will be no connection between them in the internal data model so you can't switch between the two views within the same set of imported data. One of the main trade offs which seqmonk makes in order to allow it to handle large datasets quickly is that it doesn't maintain links between alignment segments, either for paired reads, or for splice segments in spliced reads. For our internal quantitation of spliced data we use the relative length of each aligned segment to infer how many reads we should count when we're summing up the contribution of different spliced segments. In your case as long as you're correcting for total read counts then it shouldn't matter too much that you have a mix of single and paired end data. In terms of counts the paired data will be somewhat similar to simply doubling a single end sample, and the global correction will normalise this away. If you want to more explicity correct for this then you could apply a manual correction to halve the counts for your paired end data (this is one of the quantitation options). |
|
![]() |
![]() |
![]() |
#122 |
Member
Location: Goettingen, Germany Join Date: Aug 2011
Posts: 28
|
![]()
Hey Simon,
after upgrading to the new version Seqmonk doesn't recognize the the chromosome names of the bowtie2 standard sam-output file anymore (still using the same bowtie2-provided index for mm10) Error message says, it cant make a name of chrM... However, in a sorted bam file, it reads all the lines until it reaches chrM and then quits with the same error, leaving no reads to look at... Is that because of the new version or did I do something crazy here? If not, can I somehow exclude importing unreadable chr names (as I'm not interested in chM anyways) and still importing all others? Thanks a lot for your help so far and (hopefully) in the future ![]() r edit: rolling back to 0.22 solves the problem. the same sam/bam/sortedbam are read without complaints! Well, complaints are coming about chrM in the end, but the program just reports that and doesn't break in between... |
![]() |
![]() |
![]() |
#123 |
Simon Andrews
Location: Babraham Inst, Cambridge, UK Join Date: May 2009
Posts: 871
|
![]()
This is a bug in the latest version caused by the way we handle chromosome name matches internally. I've just put a fix into the development version and writing the release notes for a new release right now.
There should be an update out in a couple of hours which will fix this. Sorry for any trouble this has caused. |
![]() |
![]() |
![]() |
#124 |
Simon Andrews
Location: Babraham Inst, Cambridge, UK Join Date: May 2009
Posts: 871
|
![]()
There is a new release of seqmonk (v0.23.1) which should fix the import problems found in v0.23.0. It's up on the project page now and you should be able to get it.
Please let me know if any problems persist with this new version. |
![]() |
![]() |
![]() |
#125 | |
Member
Location: Goettingen, Germany Join Date: Aug 2011
Posts: 28
|
![]() Quote:
Thanks for that REALLY fast update! It know works as the version beofre, chrM could, however still not be extracted, but the workaround for that can be found here... |
|
![]() |
![]() |
![]() |
#126 |
Member
Location: australia Join Date: Jan 2011
Posts: 81
|
![]()
I ahve a question about using scatter plot in seqmonk. When you scatter plot two read counts/ expression there is cloud of red dots in in the center what that mean? Alternatively Can someone point me what various colors in such scatter plot mean
|
![]() |
![]() |
![]() |
#127 |
Simon Andrews
Location: Babraham Inst, Cambridge, UK Join Date: May 2009
Posts: 871
|
![]()
The colours in the scatterplot represent the density of points which are overlaid at that point in the plot. There are normally way too many points to be able to show each one, so we use the colours to show where the plot high densities of points are being found. The colour scheme is the standard cold to hot colours used elsewhere in the program.
|
![]() |
![]() |
![]() |
#128 |
Member
Location: australia Join Date: Jan 2011
Posts: 81
|
![]()
Thanks Simon for quick reply So from the attached image - red area represents a perfect correlation (or close to 1) and as we move away from line it decrease.
Is it reasonable or I misunderstood something. Thanks Last edited by mathew; 12-18-2012 at 06:59 PM. |
![]() |
![]() |
![]() |
#129 |
Simon Andrews
Location: Babraham Inst, Cambridge, UK Join Date: May 2009
Posts: 871
|
![]()
The red area simply shows regions of the plot where many probes are packed on top of each other. There are never enough pixels on the screen to show every probe independently so the colours simply relate to how many probes are overlaid at a particular position. In your plot it shows that the largest number of probes are in a region of the plot showing very little change between the two conditions you've plotted out.
|
![]() |
![]() |
![]() |
#130 |
Member
Location: USA Join Date: Mar 2011
Posts: 25
|
![]()
Q1: What is the shortest way to obtain a list of probes that have 'FWD only', 'RVR only' and 'NONE reads' covering them across multiple data stores independently. The idea is to avoid going through the same steps for each datasets.
Currently I'm following this workflow, however I can't get it to work on multiple datastores: 1. Defining my probes. 2. Quantintating FWD reads only 3. Filtering on value -> Individual probes -> value above 1, for exactly 1 of the one selected 1 4. Quantitating RVR reads only 5. Filtering on value -> Individual probes -> value above 1, for exactly 1 of the one selected 1 6. Filtering by combining existing list: 6.1. 'RVR value above 1' BUTNOT 'FWD value above 1' 6.2. 'FWD value above 1' BUTNOT 'RVR value above 1' Q2: By following this sequence of steps, will 6.2 produce probes with some value for FWD only and not RVR (given that last quantitation was made for RVR reads only? I believe that should work for single data store given the answer to the Q2 is yes. When I tried (in step 3) to go through multiple datasets I'm a little confused which option to choose. Would you be able to give me a hint here? Thanks in advance. |
![]() |
![]() |
![]() |
#131 |
Simon Andrews
Location: Babraham Inst, Cambridge, UK Join Date: May 2009
Posts: 871
|
![]()
I'm not exactly clear what you want to know - do you want a quick way to determine if a given probe is forward or reverse only in all of a set of stores, or are you looking for a quick way to make separate lists for several stores where you have them?
To do the analysis across several stores you could basically repeat the process you outlined but selecting all of the stores, and making your values filters use 'at least 1' rather than 'exactly 1' to pull out probes which had a read in that direction in any of your stores. You could also put all of your data into a single data group and then treat it as a single dataset. You might also want to try using a difference quantitation, where you could do forward reads as percentage of all reads, and then filter for either 100% forward or 0% forward which might be easier than going through lots of values filters. The only hiccup with this would be that empty features would also show up with 0% so you'd need to have done a read count quantitaiton first and created lists for each of your datasets of probes containing no reads and then use the combine filter to subtract these from the reverse only set (0%) to get the true reverse only count. |
![]() |
![]() |
![]() |
#132 |
Simon Andrews
Location: Babraham Inst, Cambridge, UK Join Date: May 2009
Posts: 871
|
![]()
I have acted on one of my New Year's resolutions and have finally got round to producing some more tutorial videos showing the use of SeqMonk to analyse a number of different datasets covering RNA-Seq (both simple and complex experimental setups), ChIP-Seq, BS-Seq and Hi-C data.
All of the videos can be found at our YouTube channel at: https://www.youtube.com/user/babrahambioinf If you have any suggestions for other tutorials which might be useful then please let me know and I'll have a go at putting them together. |
![]() |
![]() |
![]() |
#133 |
Senior Member
Location: Pittsburgh Join Date: Feb 2010
Posts: 151
|
![]()
Hi Simon,
Great job! One suggestion if you can add demo to plot density graphs across TSS or more precisely, how one can plot density graph across TSS as discussed in http://seqanswers.com/forums/showthr...ht=tss+density will be great. I know close to this can be plotted in Seqmonk, however if we can select specific probes and plot the graphics as suggested will be very helpful. |
![]() |
![]() |
![]() |
#134 | |
Simon Andrews
Location: Babraham Inst, Cambridge, UK Join Date: May 2009
Posts: 871
|
![]() Quote:
The only thing which looks different in the post you linked is that in seqmonk the probes are ordered in the plot by the number of reads covering each probe, whereas the plot you linked to looks like they did something else to order the probes to get some of the patterns you saw. In some older versions we used to cluster the probes but this often produced a messier result than you'd hope. I'm happy to hear suggestions for other ways we could order these plots if there's anything we could do better. |
|
![]() |
![]() |
![]() |
#135 | |||
Member
Location: USA Join Date: Mar 2011
Posts: 25
|
![]() Quote:
I wanted to have a list of probes that have just strand specific reads covering them. So the list would contain probes that have only fwd, only rvrs, none, and both type of reads covering them. Ideally I would like to have such a list for each of my stores independently. Is it possible to create that in single SeqMonk pipeline or does it have to be repeated for each store separately? Quote:
What I see is that the first visible cluster of probes for the bottom dataset (on the screenshot) is not spread entirely over the read. Instead it covers the section of the read that met similar criteria for the top dataset. I was thinking that this would produce probes with value 1 for the top dataset as it is currently seen. However for the bottom dataset I would have probes of value 1 for entire read, wider than that of top dataset. Quote:
I hope I didn't make it more complicated. |
|||
![]() |
![]() |
![]() |
#136 | |
Simon Andrews
Location: Babraham Inst, Cambridge, UK Join Date: May 2009
Posts: 871
|
![]() Quote:
1) Quantitate using the difference quantitation using the option to quantitate forward reads as percentage of all reads. 2) Use the values filter to select probes with a value of 0 or 100 for each store. This should be pretty quick since you can set the parameters and then just select each store in turn in the filter and re-run it without having to reopen the dialog. 3) Requantitate the data with a simple read count quantitation (no corrections or transformations). 4) Use a values filter to select probes with some reads in them for each store. 5) Use the combine probes filter to select the subset of the 0% results which actually have some data in them. Again you can do this in a single filter session by changing which lists you're using so it shouldn't be too horrible. Does this sound OK? |
|
![]() |
![]() |
![]() |
#137 |
Member
Location: USA Join Date: Mar 2011
Posts: 25
|
![]()
That does sound OK indeed. Thanks!
As another alternative I could do simple read count for all probes and create an annotated probe report #1 (not annotating with anything) for all the stores which would give me a list of probes with '0's for probes not having any reads over them. Do the difference quantitation of forward as percentage of all, which would give me the list probes with '100' for the probes with only forward reads across all stores. Probe report #2. Same for the reverse. Probe report #3. Having these three reports it would be easy to parse it outside of SeqMonk. If probe in #1 = 0 => no reads. If probe in #1 > 0 and probe in #2 = 100 => then only forward If probe in #1 > 0 and probe in #3 = 100 => then only reverse. If probe in #1 > 0 and probes in #2 and #3 different that 0 and 100 => both reads I think this way I will get what I need the fastest for all stores. One way or another, your input about difference quantitation was invaluable. Thanks again. Last edited by mjp; 01-08-2013 at 02:52 AM. Reason: added more details |
![]() |
![]() |
![]() |
#138 |
Member
Location: france Join Date: Oct 2012
Posts: 27
|
![]()
Hello,
I want to know how can I vizualise teh bedgrap file from bismark after methylation call? Thanks |
![]() |
![]() |
![]() |
#139 | |
Simon Andrews
Location: Babraham Inst, Cambridge, UK Join Date: May 2009
Posts: 871
|
![]() Quote:
I put up a tutorial video covering some of the basics for working with bisulphite data on our youtube channel which should give you an idea how to get started with this. |
|
![]() |
![]() |
![]() |
#140 |
Member
Location: Aperture Science Join Date: Mar 2012
Posts: 59
|
![]()
Dear Simon.
I sent you a private message a few weeks ago. Perhaps you can take a look at it? It was a question regarding installing a custom genome. |
![]() |
![]() |
![]() |
Tags |
analysis, desktop, seqmonk, visualization |
Thread Tools | |
|
|