Seqanswers Leaderboard Ad

**kshankar** · 02-22-2013, 12:39 PM

Average read coverage

Is there a way to calculate the average read coverage / exon for RNA-seq datasets using SeqMonk. I imagine it would be similar to the Coverage Depth quantitation except that it gives min or max coverage. We are trying to find genes that have a meaningful RPKM value. Based on this paper (McIntyre et al, BMC Genomics, 2011, 12:293), an average of 5 reads per base (for each exon) was needed to have reliable RPKMs. Didnt know if there was a way to compute this for all exons and then filter ones with insufficient quatitation.

**simonandrews** · 02-24-2013, 06:40 AM

Originally posted by kshankar View Post

Is there a way to calculate the average read coverage / exon for RNA-seq datasets using SeqMonk. I imagine it would be similar to the Coverage Depth quantitation except that it gives min or max coverage. We are trying to find genes that have a meaningful RPKM value. Based on this paper (McIntyre et al, BMC Genomics, 2011, 12:293), an average of 5 reads per base (for each exon) was needed to have reliable RPKMs. Didnt know if there was a way to compute this for all exons and then filter ones with insufficient quatitation.

If you put probes over exons (do feature probes over mRNA and split into subfeatures) and then do a base pair quantitation which corrects for length then you'll get a base count per base of probe, which I think is the measure you wanted.

It would also be pretty easy to extend the coverage depth quantitation to allow mean or median value as a valid measure so I'll add that to the next release, but I think you can do what you want with the existing tools.

**mathew** · 03-03-2013, 07:35 AM

time course RNA-seq with replicate

I was wondering is it possible to analyze (exon level) RNA-seq data with three replicates over a time course in Seqmonk.

Thanks

**simonandrews** · 03-05-2013, 01:53 AM

Originally posted by mathew View Post

I was wondering is it possible to analyze (exon level) RNA-seq data with three replicates over a time course in Seqmonk.

Yes, you can certainly do this kind of analysis with SeqMonk. In essence the analysis would be the same as for any complex experiment type. We did a tutorial video which covers some of the options for this at the transcript level, but you could do the same thing starting from exons.

For the initial quantitation you would just put probes over all exons, but probably filtering your set of transcripts to remove odd biotypes to reduce your set of exons somewhat. We normally just use protein coding mRNAs for our analysis now which about halves the number of transcripts we have to consider from the full Ensembl set. Once you have the probes you'd use a read count quantitation correcting for total count and log transforming. You would do the quantitation normalisation in the same way as for whole transcript or gene quantitation.

For the differential analysis I'd suggest doing an intensity difference filter using replicate sets of all of your replicate groups. I'd then do a replicate set stats analysis using the same replicate groups and then focus on exons which were found to be significant by both of these methods. Once you have the full set of changing exons you could use the clustering tools to find sets of exons which behave similarly to make the interpretation of your analysis more simple.

The problem you're always fighting against with this type of analysis is the number of tests you are performing. If you're looking at all exons in the genome you will have a huge number of probes and potentially low observation levels which will make it difficult to achieve a sufficient level of significance to survive multiple testing correction. You can't pre-filter on difference before doing the intensity difference filter since this will break the statistical background model, but if there's any other way to focus on exon which are more likely to be of interest then you could think about that.

The other option you have if you want to look at alternative splicing it to analyse the splice events rather than the exon expression. When you import your data you would choose to import spliced reads and then import introns rather than exons. You could then use the read position probe generator to put a probe over each different splice event, and the exact overlap quantitation method to count the number of occurrences of each splice event in your different samples. Analysing these data will allow you to focus more explicitly on changes in splicing as opposed to the more mixed signals you might get from looking at read counts in overlapping exons.

On my todo is is another tutorial which goes over the options in the program for looking at alternative splicing since this is something we're increasingly doing and which is getting more feasible with longer read lengths and higher coverage depths.

**renext** · 03-05-2013, 08:01 PM

Raw counts difference between v0.24.0 and v0.24.1

Hi Simon Andrews,

I have really enjoyed to use SeqMonk to analyze my mRNA-seq data. It is certainly a great program.

Recently, I noticed that new version of SeqMonk has been released (v0.24.1) and just simply rerun the analysis of my mRNA-seq data using the new version.

I used 'Quantitation Pipelines - RNA Seq quantitation pipelines' and selected following options - mRNA / Non-strand speicif / Merge transcript isoforms / Generate Raw Counts. Although most of probes gave same counting numbers, surprisingly some probes gave different counting numbers compared to the results from old (v0.24.0).

Changes in counting number seems to be consistent between samples, but it slightly changes p-value when I do intensity Difference filtering or Replicate Set filtering, giving a slightly different set of probes.

FYI, I used exactly same options. I assumed that raw counts should be remained same between v0.24.0 and v0.24.1 because v0.24.0 only incorrectly corrected for total read count if I used unmerged transcripts based on your note - Yes, this is the reason I rerun my analysis using v0.24.1 just in case. I am wondering what's going on. Do you have any idea why I got different read counts for some probes (not all)?

Thanks for your help in advance!

**simonandrews** · 03-06-2013, 05:11 AM

The code which was changed in 0.24.1 only affected the global correction applied, so if you've been using the 'generate raw counts' option then this wouldn't have been applied so you shouldn't have got anything different between the two versions. I've just been back and checked the logs for that code and apart from deleting a commented line, the only change was in that correction code which wouldn't be executed if you were generating raw counts.

If you're seeing consistent differences for only a small number of probes is it possible that your annotation has changed between the analyses? Could you have been using a filtered set of transcripts for your analysis before, or could your annotation for this assembly have been updated?

I should also point out that in your mail it sounded like you were using raw counts for the statistical testing within SeqMonk (you may not be, it wasn't entirely clear). If you're using the tests inside the program you really want to use normalised log transformed counts for the analyses. Raw counts are only intended to be used for external analyses (DESeq for example) which require them.

**renext** · 03-06-2013, 07:21 AM

Thanks for the reply!

I have played around what is different between my previous analysis and current one more than ver. of program. In fact, there was a difference other than ver. of program.

In previous analysis, when I imported my mapped files, I selected the option 'split spliced reads'. Then when I run the pipeline, I selected 'merge transcript isoform'. In current one, I didn't check 'split spliced reads' when I imported my data. Although I choose same option for the pipeline (merge transcript isoform), it gave different results as I mentioned depending on 'split spliced reads' option in Import data step.

Since I did a pretty much regular mRNA-seq mapping, not a spliced mapping, I guess I shouldn't check 'split spliced reads'. Is it correct?

In addition, you are right. I was not clear before but I actually used normalized log transformed counts for the statistical analyses in SeqMonk. Main reason I also want to get raw counts is for external analyses as you pointed out.

Thank you again for your wonderful help!

**simonandrews** · 03-07-2013, 12:51 AM

If you've not done a spliced mapping then the 'split spliced reads' option won't do an awful lot to your data. The one change it will force is to split paired end reads into individual reads rather than generating a single read which covers the pair. It will fix the strand attributes to be correct for directional paired end libraries.

For any RNA-Seq data we'd strongly recommend using the split reads option, even if you didn't use a spliced mapper since it will do the right thing with your data. In particular, the RNA-Seq quantitation needs to use individually mapped reads rather than joined read pairs. The reason for this is that although you want to count reads, if you've done a spliced import some reads will have been split up, and would therefore count double if simple read counts were used. The program therefore counts bases of overlap and then works out the read length for the sample and then uses this to convert the base counts back to read counts. If you've used joined read pairs then you'll get huge (and incorrect) base counts, and you'll get a predicted read length which is also huge (and incorrect).

**renext** · 03-07-2013, 08:33 AM

[QUOTE=simonandrews;98404]For any RNA-Seq data we'd strongly recommend using the split reads option, even if you didn't use a spliced mapper since it will do the right thing with your data.

Thanks! I got it. So I should use 'split spliced reads' option in data import step almost all the time for RNA-seq data. Then I guess my first analysis (using split spliced reads option) is correct one. I may be confused since I read 'Spliced import. If you have done a spliced mapping you will be offered the option to split spliced reads into their component parts' in the SeqMonk manual. I thought, I can choose that option if I have done a spliced mapping.

But certainly, I found that in your youtube tutorial that you selected 'split spliced reads' option for RNA-seq analysis when you import your data!

I really appreciate your clarification. Thanks!

**griffon42** · 03-07-2013, 01:46 PM

I've recently installed SeqMonk - it really looks great, and should be very useful for some of the analyses I'm interested in running. Many thanks to the developers!

However, I've run into a rather strange bug. Looking at the menubar, the "File" "Edit" and "View" pulldowns all work just fine. However, all other menus, such as "Data" "Plots" "Filtering" and so on, can be pulled down, but the options are not accessible - they are all completely greyed out.

Nothing seems to change this. I can import large BAM files just fine, visualize reads just fine, and so forth - just cannot access menu items. I've run into the same problem with trying the sample data set from the Seqmonk website.

Searching the web, I haven't encountered any reports of similar behavior - can anyone comment on this? I feel like i must be missing something obvious.

I'm running SeqMonk on Mac OS X 10.8.2, 32GB RAM (10GB available to SeqMonk), with the more recent version of Java.

Thanks for the help.

**mathew** · 03-10-2013, 12:57 PM

question of Seqmonk

I am using seq monk for RNA-seq analysis and am looking for following question:

1. in Version 24 when I go to feature probe generator-feature to design around attenuator, what is this related to?
2. I see mRNA option is not there, so now if I have to design probes for mRNA what should be equivalent.
3. I am working on RNA-seq of bacteria do I need to still import as split reads?

Thanks

**simonandrews** · 03-11-2013, 01:40 AM

Originally posted by mathew View Post

I am using seq monk for RNA-seq analysis and am looking for following question:

1. in Version 24 when I go to feature probe generator-feature to design around attenuator, what is this related to?
2. I see mRNA option is not there, so now if I have to design probes for mRNA what should be equivalent.
3. I am working on RNA-seq of bacteria do I need to still import as split reads?

Hi Matthew,

The RNA-Seq quantitation pipeline tries to guess which of your annotation tracks is appropriate to use for RNA analysis. If there is an mRNA track available then it will suggest that, but if there isn't one (which is what it sounds like in your case) then it will just use the first track (which I guess would be attenuator). This wouldn't be an appropriate track to use so you'd need to select something more suitable.

It's odd that there isn't an mRNA track in your genome. Is this one of our core genomes or something you've imported yourself? If it's a bacterium you might need to use ORF, CDS or maybe something like operon as the basis for your analysis.

When you import your data you should always use the split reads option, even if you're working on bacterial data. You won't have any splice sites, but selecting this option will also set other options which ensure that your imported data is formatted appropriately for RNA-Seq quantitation.

Hope this helps

Simon.

**mathew** · 03-16-2013, 11:54 AM

seqmonk intergenic probes

Can Seqmonk map probes in intergenic region. To be more precise Can it help me in giving read counts in noncoding RNA directly/ indirectly

?
Thanks.

**simonandrews** · 03-16-2013, 01:04 PM

Originally posted by mathew View Post

Can Seqmonk map probes in intergenic region. To be more precise Can it help me in giving read counts in noncoding RNA directly/ indirectly

?
Thanks.

Yes, it can put probes wherever you need them, either by using one of the existing feature tracks or you can import your own set of positions.

For intergenic regions for example you could put probes over genes and then use the interstitial probe generator to make intergenic probes.

You could go one step further and put probes over all exons (select mRNA and split into subfeatures). You could then make interstitial probes from these to get a mixed set of introns and intergenic. You could then separate these by selecting for an overlap with genes to select the subgroup you want.

For noncoding RNA there are a number of tracks already available which might be of use (miRNA, snoRNA etc) or if you want a set of coordinates you want to use you can import these into a new feature track and then use these as the basis for probe design.

If you can let me know more specifically what you're trying to do I can try to give you more exact suggestions.

**mathew** · 03-19-2013, 10:54 AM

long noncoding RNA detection and quantification

Thanks Simon,

Will it include specifically long noncoding RNA?

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 39 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 41 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 35 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News