Seqanswers Leaderboard Ad

**simonandrews** · 11-26-2012, 06:45 AM

Originally posted by Neuromancer View Post

Hey all,

how does Seqmonk count paired-end reads? Is each pair only counted once (or once per gene?) or is each read counted individually? In any case: is there any way how to switch between these two modes?

How paired end reads show up in the program is determined when you first import the data. If you choose to import your data as paired end then the reads will appear as a single location which spans the region inferred to be covered by the pair. The direction of this single region will be taken from the first read in the pair.

If you import as single end then both ends of the pair will be shown as separate reads but there will be no connection between them in the internal data model so you can't switch between the two views within the same set of imported data.

One of the main trade offs which seqmonk makes in order to allow it to handle large datasets quickly is that it doesn't maintain links between alignment segments, either for paired reads, or for splice segments in spliced reads. For our internal quantitation of spliced data we use the relative length of each aligned segment to infer how many reads we should count when we're summing up the contribution of different spliced segments.

In your case as long as you're correcting for total read counts then it shouldn't matter too much that you have a mix of single and paired end data. In terms of counts the paired data will be somewhat similar to simply doubling a single end sample, and the global correction will normalise this away. If you want to more explicity correct for this then you could apply a manual correction to halve the counts for your paired end data (this is one of the quantitation options).

**Neuromancer** · 12-11-2012, 01:37 AM

new seqmonk version 0.23.0

Hey Simon,

after upgrading to the new version Seqmonk doesn't recognize the the chromosome names of the bowtie2 standard sam-output file anymore (still using the same bowtie2-provided index for mm10) Error message says, it cant make a name of chrM... However, in a sorted bam file, it reads all the lines until it reaches chrM and then quits with the same error, leaving no reads to look at...
Is that because of the new version or did I do something crazy here?
If not, can I somehow exclude importing unreadable chr names (as I'm not interested in chM anyways) and still importing all others?

Thanks a lot for your help so far and (hopefully) in the future

r

edit:
rolling back to 0.22 solves the problem. the same sam/bam/sortedbam are read without complaints! Well, complaints are coming about chrM in the end, but the program just reports that and doesn't break in between...

**simonandrews** · 12-11-2012, 01:42 AM

This is a bug in the latest version caused by the way we handle chromosome name matches internally. I've just put a fix into the development version and writing the release notes for a new release right now.

There should be an update out in a couple of hours which will fix this. Sorry for any trouble this has caused.

**simonandrews** · 12-11-2012, 02:12 AM

There is a new release of seqmonk (v0.23.1) which should fix the import problems found in v0.23.0. It's up on the project page now and you should be able to get it.

Please let me know if any problems persist with this new version.

**Neuromancer** · 12-11-2012, 02:34 AM

Originally posted by simonandrews View Post

There is a new release of seqmonk (v0.23.1) which should fix the import problems found in v0.23.0. It's up on the project page now and you should be able to get it.

Please let me know if any problems persist with this new version.

Yepp - up and running!
Thanks for that REALLY fast update!
It know works as the version beofre, chrM could, however still not be extracted, but the workaround for that can be found here...

**mathew** · 12-18-2012, 09:26 AM

I ahve a question about using scatter plot in seqmonk. When you scatter plot two read counts/ expression there is cloud of red dots in in the center what that mean? Alternatively Can someone point me what various colors in such scatter plot mean

**simonandrews** · 12-18-2012, 10:06 AM

The colours in the scatterplot represent the density of points which are overlaid at that point in the plot. There are normally way too many points to be able to show each one, so we use the colours to show where the plot high densities of points are being found. The colour scheme is the standard cold to hot colours used elsewhere in the program.

**mathew** · 12-18-2012, 10:23 AM

scatter plot colors

Thanks Simon for quick reply So from the attached image - red area represents a perfect correlation (or close to 1) and as we move away from line it decrease.
Is it reasonable or I misunderstood something.

Thanks

**simonandrews** · 12-18-2012, 10:42 AM

The red area simply shows regions of the plot where many probes are packed on top of each other. There are never enough pixels on the screen to show every probe independently so the colours simply relate to how many probes are overlaid at a particular position. In your plot it shows that the largest number of probes are in a region of the plot showing very little change between the two conditions you've plotted out.

**mjp** · 01-01-2013, 06:06 AM

strand specific probes

Q1: What is the shortest way to obtain a list of probes that have 'FWD only', 'RVR only' and 'NONE reads' covering them across multiple data stores independently. The idea is to avoid going through the same steps for each datasets.

Currently I'm following this workflow, however I can't get it to work on multiple datastores:
1. Defining my probes.
2. Quantintating FWD reads only
3. Filtering on value -> Individual probes -> value above 1, for exactly 1 of the one selected 1
4. Quantitating RVR reads only
5. Filtering on value -> Individual probes -> value above 1, for exactly 1 of the one selected 1
6. Filtering by combining existing list:
6.1. 'RVR value above 1' BUTNOT 'FWD value above 1'
6.2. 'FWD value above 1' BUTNOT 'RVR value above 1'

Q2: By following this sequence of steps, will 6.2 produce probes with some value for FWD only and not RVR (given that last quantitation was made for RVR reads only?

I believe that should work for single data store given the answer to the Q2 is yes. When I tried (in step 3) to go through multiple datasets I'm a little confused which option to choose.

Would you be able to give me a hint here?

Thanks in advance.

**simonandrews** · 01-02-2013, 01:27 AM

I'm not exactly clear what you want to know - do you want a quick way to determine if a given probe is forward or reverse only in all of a set of stores, or are you looking for a quick way to make separate lists for several stores where you have them?

To do the analysis across several stores you could basically repeat the process you outlined but selecting all of the stores, and making your values filters use 'at least 1' rather than 'exactly 1' to pull out probes which had a read in that direction in any of your stores. You could also put all of your data into a single data group and then treat it as a single dataset.

You might also want to try using a difference quantitation, where you could do forward reads as percentage of all reads, and then filter for either 100% forward or 0% forward which might be easier than going through lots of values filters. The only hiccup with this would be that empty features would also show up with 0% so you'd need to have done a read count quantitaiton first and created lists for each of your datasets of probes containing no reads and then use the combine filter to subtract these from the reverse only set (0%) to get the true reverse only count.

**simonandrews** · 01-04-2013, 08:48 AM

I have acted on one of my New Year's resolutions and have finally got round to producing some more tutorial videos showing the use of SeqMonk to analyse a number of different datasets covering RNA-Seq (both simple and complex experimental setups), ChIP-Seq, BS-Seq and Hi-C data.

All of the videos can be found at our YouTube channel at:

https://www.youtube.com/user/babrahambioinf

If you have any suggestions for other tutorials which might be useful then please let me know and I'll have a go at putting them together.

**honey** · 01-04-2013, 01:11 PM

Hi Simon,

Great job! One suggestion if you can add demo to plot density graphs across TSS or more precisely, how one can plot density graph across TSS as discussed in http://seqanswers.com/forums/showthr...ht=tss+density
will be great. I know close to this can be plotted in Seqmonk, however if we can select specific probes and plot the graphics as suggested will be very helpful.

**simonandrews** · 01-04-2013, 01:20 PM

Originally posted by honey View Post

Hi Simon,

Great job! One suggestion if you can add demo to plot density graphs across TSS or more precisely, how one can plot density graph across TSS as discussed in http://seqanswers.com/forums/showthr...ht=tss+density
will be great. I know close to this can be plotted in Seqmonk, however if we can select specific probes and plot the graphics as suggested will be very helpful.

The ChIP-Seq tutorial constructs one of these types of graph after doing the peak detection part (TSS probes is 6:20 and the plot is about 8:20) so hopefully that covers much of what you wanted. The plot will display whichever probes you have selected through the filters.

The only thing which looks different in the post you linked is that in seqmonk the probes are ordered in the plot by the number of reads covering each probe, whereas the plot you linked to looks like they did something else to order the probes to get some of the patterns you saw. In some older versions we used to cluster the probes but this often produced a messier result than you'd hope. I'm happy to hear suggestions for other ways we could order these plots if there's anything we could do better.

**mjp** · 01-06-2013, 11:06 PM

Originally posted by simonandrews View Post

do you want a quick way to determine if a given probe is forward or reverse only in all of a set of stores, or are you looking for a quick way to make separate lists for several stores where you have them?

Sorry for this small delay and for not being specific enough. I see that the title of my post was wrong.
I wanted to have a list of probes that have just strand specific reads covering them. So the list would contain probes that have only fwd, only rvrs, none, and both type of reads covering them. Ideally I would like to have such a list for each of my stores independently. Is it possible to create that in single SeqMonk pipeline or does it have to be repeated for each store separately?

Originally posted by simonandrews View Post

To do the analysis across several stores you could basically repeat the process you outlined but selecting all of the stores, and making your values filters use 'at least 1' rather than 'exactly 1' to pull out probes which had a read in that direction in any of your stores.

I'm not quite sure if that does what I want - this is where the confusion starts. I created 2 sample stores for yeast chr1 and run the workflow that I outlined previously but this time selecting 'at least 1' for all selected stores (please see the attached image). Probes generated using running window, size 1 step 1.
What I see is that the first visible cluster of probes for the bottom dataset (on the screenshot) is not spread entirely over the read. Instead it covers the section of the read that met similar criteria for the top dataset.

I was thinking that this would produce probes with value 1 for the top dataset as it is currently seen. However for the bottom dataset I would have probes of value 1 for entire read, wider than that of top dataset.

Originally posted by simonandrews View Post

You could also put all of your data into a single data group and then treat it as a single dataset.

These are independent samples. So I would like to avoid doing that.

I hope I didn't make it more complicated.

Attached Files

Screenshot from 2013-01-07 09:37:24.png (22.8 KB, 5 views)

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 17 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 46 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News