SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
ChIP-Seq: Enabling Data Analysis on High-Throughput Data in Large Data Depository Usi Newsbot! Literature Watch 1 04-18-2018 10:50 PM
Cufflinks - Nature Biotech data sets adrian Bioinformatics 1 04-16-2011 05:40 PM
public data sets muchomaas Bioinformatics 2 06-08-2010 02:48 AM
sff_extract: combining data from 454 Flx and Titanium data sets agroster Bioinformatics 7 01-14-2010 11:19 AM
SeqMonk - Flexible analysis of mapped reads simonandrews Bioinformatics 7 07-24-2009 05:12 AM

Reply
 
Thread Tools
Old 11-26-2012, 06:45 AM   #121
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

Quote:
Originally Posted by Neuromancer View Post
Hey all,

how does Seqmonk count paired-end reads? Is each pair only counted once (or once per gene?) or is each read counted individually? In any case: is there any way how to switch between these two modes?
How paired end reads show up in the program is determined when you first import the data. If you choose to import your data as paired end then the reads will appear as a single location which spans the region inferred to be covered by the pair. The direction of this single region will be taken from the first read in the pair.

If you import as single end then both ends of the pair will be shown as separate reads but there will be no connection between them in the internal data model so you can't switch between the two views within the same set of imported data.

One of the main trade offs which seqmonk makes in order to allow it to handle large datasets quickly is that it doesn't maintain links between alignment segments, either for paired reads, or for splice segments in spliced reads. For our internal quantitation of spliced data we use the relative length of each aligned segment to infer how many reads we should count when we're summing up the contribution of different spliced segments.

In your case as long as you're correcting for total read counts then it shouldn't matter too much that you have a mix of single and paired end data. In terms of counts the paired data will be somewhat similar to simply doubling a single end sample, and the global correction will normalise this away. If you want to more explicity correct for this then you could apply a manual correction to halve the counts for your paired end data (this is one of the quantitation options).
simonandrews is offline   Reply With Quote
Old 12-11-2012, 01:37 AM   #122
Neuromancer
Member
 
Location: Goettingen, Germany

Join Date: Aug 2011
Posts: 28
Default new seqmonk version 0.23.0

Hey Simon,

after upgrading to the new version Seqmonk doesn't recognize the the chromosome names of the bowtie2 standard sam-output file anymore (still using the same bowtie2-provided index for mm10) Error message says, it cant make a name of chrM... However, in a sorted bam file, it reads all the lines until it reaches chrM and then quits with the same error, leaving no reads to look at...
Is that because of the new version or did I do something crazy here?
If not, can I somehow exclude importing unreadable chr names (as I'm not interested in chM anyways) and still importing all others?

Thanks a lot for your help so far and (hopefully) in the future

r

edit:
rolling back to 0.22 solves the problem. the same sam/bam/sortedbam are read without complaints! Well, complaints are coming about chrM in the end, but the program just reports that and doesn't break in between...
Neuromancer is offline   Reply With Quote
Old 12-11-2012, 01:42 AM   #123
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

This is a bug in the latest version caused by the way we handle chromosome name matches internally. I've just put a fix into the development version and writing the release notes for a new release right now.

There should be an update out in a couple of hours which will fix this. Sorry for any trouble this has caused.
simonandrews is offline   Reply With Quote
Old 12-11-2012, 02:12 AM   #124
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

There is a new release of seqmonk (v0.23.1) which should fix the import problems found in v0.23.0. It's up on the project page now and you should be able to get it.

Please let me know if any problems persist with this new version.
simonandrews is offline   Reply With Quote
Old 12-11-2012, 02:34 AM   #125
Neuromancer
Member
 
Location: Goettingen, Germany

Join Date: Aug 2011
Posts: 28
Default

Quote:
Originally Posted by simonandrews View Post
There is a new release of seqmonk (v0.23.1) which should fix the import problems found in v0.23.0. It's up on the project page now and you should be able to get it.

Please let me know if any problems persist with this new version.
Yepp - up and running!
Thanks for that REALLY fast update!
It know works as the version beofre, chrM could, however still not be extracted, but the workaround for that can be found here...
Neuromancer is offline   Reply With Quote
Old 12-18-2012, 09:26 AM   #126
mathew
Member
 
Location: australia

Join Date: Jan 2011
Posts: 81
Default

I ahve a question about using scatter plot in seqmonk. When you scatter plot two read counts/ expression there is cloud of red dots in in the center what that mean? Alternatively Can someone point me what various colors in such scatter plot mean
mathew is offline   Reply With Quote
Old 12-18-2012, 10:06 AM   #127
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

The colours in the scatterplot represent the density of points which are overlaid at that point in the plot. There are normally way too many points to be able to show each one, so we use the colours to show where the plot high densities of points are being found. The colour scheme is the standard cold to hot colours used elsewhere in the program.
simonandrews is offline   Reply With Quote
Old 12-18-2012, 10:23 AM   #128
mathew
Member
 
Location: australia

Join Date: Jan 2011
Posts: 81
Default scatter plot colors

Thanks Simon for quick reply So from the attached image - red area represents a perfect correlation (or close to 1) and as we move away from line it decrease.
Is it reasonable or I misunderstood something.

Thanks

Last edited by mathew; 12-18-2012 at 06:59 PM.
mathew is offline   Reply With Quote
Old 12-18-2012, 10:42 AM   #129
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

The red area simply shows regions of the plot where many probes are packed on top of each other. There are never enough pixels on the screen to show every probe independently so the colours simply relate to how many probes are overlaid at a particular position. In your plot it shows that the largest number of probes are in a region of the plot showing very little change between the two conditions you've plotted out.
simonandrews is offline   Reply With Quote
Old 01-01-2013, 06:06 AM   #130
mjp
Member
 
Location: USA

Join Date: Mar 2011
Posts: 25
Default strand specific probes

Q1: What is the shortest way to obtain a list of probes that have 'FWD only', 'RVR only' and 'NONE reads' covering them across multiple data stores independently. The idea is to avoid going through the same steps for each datasets.

Currently I'm following this workflow, however I can't get it to work on multiple datastores:
1. Defining my probes.
2. Quantintating FWD reads only
3. Filtering on value -> Individual probes -> value above 1, for exactly 1 of the one selected 1
4. Quantitating RVR reads only
5. Filtering on value -> Individual probes -> value above 1, for exactly 1 of the one selected 1
6. Filtering by combining existing list:
6.1. 'RVR value above 1' BUTNOT 'FWD value above 1'
6.2. 'FWD value above 1' BUTNOT 'RVR value above 1'

Q2: By following this sequence of steps, will 6.2 produce probes with some value for FWD only and not RVR (given that last quantitation was made for RVR reads only?

I believe that should work for single data store given the answer to the Q2 is yes. When I tried (in step 3) to go through multiple datasets I'm a little confused which option to choose.

Would you be able to give me a hint here?

Thanks in advance.
mjp is offline   Reply With Quote
Old 01-02-2013, 01:27 AM   #131
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

I'm not exactly clear what you want to know - do you want a quick way to determine if a given probe is forward or reverse only in all of a set of stores, or are you looking for a quick way to make separate lists for several stores where you have them?

To do the analysis across several stores you could basically repeat the process you outlined but selecting all of the stores, and making your values filters use 'at least 1' rather than 'exactly 1' to pull out probes which had a read in that direction in any of your stores. You could also put all of your data into a single data group and then treat it as a single dataset.

You might also want to try using a difference quantitation, where you could do forward reads as percentage of all reads, and then filter for either 100% forward or 0% forward which might be easier than going through lots of values filters. The only hiccup with this would be that empty features would also show up with 0% so you'd need to have done a read count quantitaiton first and created lists for each of your datasets of probes containing no reads and then use the combine filter to subtract these from the reverse only set (0%) to get the true reverse only count.
simonandrews is offline   Reply With Quote
Old 01-04-2013, 08:48 AM   #132
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

I have acted on one of my New Year's resolutions and have finally got round to producing some more tutorial videos showing the use of SeqMonk to analyse a number of different datasets covering RNA-Seq (both simple and complex experimental setups), ChIP-Seq, BS-Seq and Hi-C data.

All of the videos can be found at our YouTube channel at:

https://www.youtube.com/user/babrahambioinf

If you have any suggestions for other tutorials which might be useful then please let me know and I'll have a go at putting them together.
simonandrews is offline   Reply With Quote
Old 01-04-2013, 01:11 PM   #133
honey
Senior Member
 
Location: Pittsburgh

Join Date: Feb 2010
Posts: 151
Default

Hi Simon,

Great job! One suggestion if you can add demo to plot density graphs across TSS or more precisely, how one can plot density graph across TSS as discussed in http://seqanswers.com/forums/showthr...ht=tss+density
will be great. I know close to this can be plotted in Seqmonk, however if we can select specific probes and plot the graphics as suggested will be very helpful.
honey is offline   Reply With Quote
Old 01-04-2013, 01:20 PM   #134
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

Quote:
Originally Posted by honey View Post
Hi Simon,

Great job! One suggestion if you can add demo to plot density graphs across TSS or more precisely, how one can plot density graph across TSS as discussed in http://seqanswers.com/forums/showthr...ht=tss+density
will be great. I know close to this can be plotted in Seqmonk, however if we can select specific probes and plot the graphics as suggested will be very helpful.
The ChIP-Seq tutorial constructs one of these types of graph after doing the peak detection part (TSS probes is 6:20 and the plot is about 8:20) so hopefully that covers much of what you wanted. The plot will display whichever probes you have selected through the filters.

The only thing which looks different in the post you linked is that in seqmonk the probes are ordered in the plot by the number of reads covering each probe, whereas the plot you linked to looks like they did something else to order the probes to get some of the patterns you saw. In some older versions we used to cluster the probes but this often produced a messier result than you'd hope. I'm happy to hear suggestions for other ways we could order these plots if there's anything we could do better.
simonandrews is offline   Reply With Quote
Old 01-06-2013, 11:06 PM   #135
mjp
Member
 
Location: USA

Join Date: Mar 2011
Posts: 25
Default

Quote:
Originally Posted by simonandrews View Post
do you want a quick way to determine if a given probe is forward or reverse only in all of a set of stores, or are you looking for a quick way to make separate lists for several stores where you have them?
Sorry for this small delay and for not being specific enough. I see that the title of my post was wrong.
I wanted to have a list of probes that have just strand specific reads covering them. So the list would contain probes that have only fwd, only rvrs, none, and both type of reads covering them. Ideally I would like to have such a list for each of my stores independently. Is it possible to create that in single SeqMonk pipeline or does it have to be repeated for each store separately?

Quote:
Originally Posted by simonandrews View Post
To do the analysis across several stores you could basically repeat the process you outlined but selecting all of the stores, and making your values filters use 'at least 1' rather than 'exactly 1' to pull out probes which had a read in that direction in any of your stores.
I'm not quite sure if that does what I want - this is where the confusion starts. I created 2 sample stores for yeast chr1 and run the workflow that I outlined previously but this time selecting 'at least 1' for all selected stores (please see the attached image). Probes generated using running window, size 1 step 1.
What I see is that the first visible cluster of probes for the bottom dataset (on the screenshot) is not spread entirely over the read. Instead it covers the section of the read that met similar criteria for the top dataset.

I was thinking that this would produce probes with value 1 for the top dataset as it is currently seen. However for the bottom dataset I would have probes of value 1 for entire read, wider than that of top dataset.

Quote:
Originally Posted by simonandrews View Post
You could also put all of your data into a single data group and then treat it as a single dataset.
These are independent samples. So I would like to avoid doing that.

I hope I didn't make it more complicated.
Attached Images
File Type: png Screenshot from 2013-01-07 09:37:24.png (22.8 KB, 5 views)
mjp is offline   Reply With Quote
Old 01-08-2013, 12:51 AM   #136
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

Quote:
Originally Posted by mjp View Post
Sorry for this small delay and for not being specific enough. I see that the title of my post was wrong.
I wanted to have a list of probes that have just strand specific reads covering them. So the list would contain probes that have only fwd, only rvrs, none, and both type of reads covering them. Ideally I would like to have such a list for each of my stores independently. Is it possible to create that in single SeqMonk pipeline or does it have to be repeated for each store separately?
I don't think you can avoid having to do some filtering per-store since I think I understand that you want to end up with a separate set of lists for each store? You'll also need to do two quantitations. I reckon the quickest way to get these lists would be:

1) Quantitate using the difference quantitation using the option to quantitate forward reads as percentage of all reads.

2) Use the values filter to select probes with a value of 0 or 100 for each store. This should be pretty quick since you can set the parameters and then just select each store in turn in the filter and re-run it without having to reopen the dialog.

3) Requantitate the data with a simple read count quantitation (no corrections or transformations).

4) Use a values filter to select probes with some reads in them for each store.

5) Use the combine probes filter to select the subset of the 0% results which actually have some data in them. Again you can do this in a single filter session by changing which lists you're using so it shouldn't be too horrible.

Does this sound OK?
simonandrews is offline   Reply With Quote
Old 01-08-2013, 02:48 AM   #137
mjp
Member
 
Location: USA

Join Date: Mar 2011
Posts: 25
Default

That does sound OK indeed. Thanks!

As another alternative I could do simple read count for all probes and create an annotated probe report #1 (not annotating with anything) for all the stores which would give me a list of probes with '0's for probes not having any reads over them.

Do the difference quantitation of forward as percentage of all, which would give me the list probes with '100' for the probes with only forward reads across all stores. Probe report #2.

Same for the reverse. Probe report #3.

Having these three reports it would be easy to parse it outside of SeqMonk.
If probe in #1 = 0 => no reads.
If probe in #1 > 0 and probe in #2 = 100 => then only forward
If probe in #1 > 0 and probe in #3 = 100 => then only reverse.
If probe in #1 > 0 and probes in #2 and #3 different that 0 and 100 => both reads

I think this way I will get what I need the fastest for all stores.

One way or another, your input about difference quantitation was invaluable.

Thanks again.

Last edited by mjp; 01-08-2013 at 02:52 AM. Reason: added more details
mjp is offline   Reply With Quote
Old 01-11-2013, 02:42 AM   #138
shadow19c
Member
 
Location: france

Join Date: Oct 2012
Posts: 27
Default

Hello,
I want to know how can I vizualise teh bedgrap file from bismark after methylation call?

Thanks
shadow19c is offline   Reply With Quote
Old 01-11-2013, 02:55 AM   #139
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

Quote:
Originally Posted by shadow19c View Post
Hello,
I want to know how can I vizualise the bedgraph file from bismark after methylation call?
SeqMonk is designed to to the quantitation of your data within the program rather than taking in externally quantitated files. Rather than trying to load the BedGraph file from Bismark you'd instead import the raw data from the methylation extractor and then quantitate this however you wanted inside SeqMonk to be able to visualise the methylation levels.

I put up a tutorial video covering some of the basics for working with bisulphite data on our youtube channel which should give you an idea how to get started with this.
simonandrews is offline   Reply With Quote
Old 01-22-2013, 04:34 AM   #140
glados
Member
 
Location: Aperture Science

Join Date: Mar 2012
Posts: 59
Default

Dear Simon.

I sent you a private message a few weeks ago. Perhaps you can take a look at it? It was a question regarding installing a custom genome.
glados is offline   Reply With Quote
Reply

Tags
analysis, desktop, seqmonk, visualization

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:12 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO