SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
ChIP-Seq: Enabling Data Analysis on High-Throughput Data in Large Data Depository Usi Newsbot! Literature Watch 1 04-18-2018 10:50 PM
Cufflinks - Nature Biotech data sets adrian Bioinformatics 1 04-16-2011 05:40 PM
public data sets muchomaas Bioinformatics 2 06-08-2010 02:48 AM
sff_extract: combining data from 454 Flx and Titanium data sets agroster Bioinformatics 7 01-14-2010 11:19 AM
SeqMonk - Flexible analysis of mapped reads simonandrews Bioinformatics 7 07-24-2009 05:12 AM

Reply
 
Thread Tools
Old 02-06-2012, 01:22 AM   #61
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

Quote:
Originally Posted by beajorrin View Post
OK! In fact I have and inter size of 500bp, so I have to change it. I have to check the trim fastq to reduce the mispaired.
Thanks
Even if you are size selecting at 500bp it's probably best to give yourself some leeway for slightly longer inserts. Size selection isn't as exact as you might think and a 1kb cutoff should still remove most of the mapping noise which might otherwise be a problem.
simonandrews is offline   Reply With Quote
Old 02-10-2012, 03:33 AM   #62
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Post Advanced SeqMonk course

After promising to do this for ages I've finally finished writing an Advanced SeqMonk Course. It won't get its first official outing for a couple of weeks, but I've released the course material onto our web site so everyone can have a look.

There are a couple of things in the course which require features which won't be released until v0.21.0 - but that should be coming fairly soon now.
simonandrews is offline   Reply With Quote
Old 02-10-2012, 05:37 AM   #63
colindaven
Senior Member
 
Location: Germany

Join Date: Oct 2008
Posts: 415
Default

Thanks for that Simon, it's a very nice document.
colindaven is offline   Reply With Quote
Old 02-24-2012, 11:15 AM   #64
mediator
Member
 
Location: New England

Join Date: Nov 2010
Posts: 27
Default

Quote:
Originally Posted by simonandrews View Post
After promising to do this for ages I've finally finished writing an Advanced SeqMonk Course. It won't get its first official outing for a couple of weeks, but I've released the course material onto our web site so everyone can have a look.

There are a couple of things in the course which require features which won't be released until v0.21.0 - but that should be coming fairly soon now.
Hi Simon,
That advanced course is really helpful, thanks! Do you know when use difference filter to identify differentially expressed genes, what is the appropriate interval for RNA-Seq experiments? I have four KO samples and four WT and I have calculated RPKM for all the samples. Thank you in advance!
mediator is offline   Reply With Quote
Old 02-25-2012, 01:16 PM   #65
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

Quote:
Originally Posted by mediator View Post
Hi Simon,
That advanced course is really helpful, thanks! Do you know when use difference filter to identify differentially expressed genes, what is the appropriate interval for RNA-Seq experiments? I have four KO samples and four WT and I have calculated RPKM for all the samples. Thank you in advance!
For this type of experiment we'd recommend using the intensity difference filter rather than a straight difference filter. The intensity difference filter is a statistical filter where cutoffs are set as p-values, and we'd normally go with the default 0.05 cutoff. Details of how the filter works are in the advanced course.

In your case as you have 4 x 4 replicates you could use a combination of the replicate stats filter for a conventional statistical analysis and the intensity difference filter between the two replicate groups to determine the significant deviations from a difference from 0. Do the intensity difference filter first though since this relies on seeing the whole distribution of points.
simonandrews is offline   Reply With Quote
Old 02-25-2012, 04:07 PM   #66
mediator
Member
 
Location: New England

Join Date: Nov 2010
Posts: 27
Default

Thank you Simon!
mediator is offline   Reply With Quote
Old 03-05-2012, 09:19 AM   #67
mediator
Member
 
Location: New England

Join Date: Nov 2010
Posts: 27
Default

Quote:
Originally Posted by simonandrews View Post
For this type of experiment we'd recommend using the intensity difference filter rather than a straight difference filter. The intensity difference filter is a statistical filter where cutoffs are set as p-values, and we'd normally go with the default 0.05 cutoff. Details of how the filter works are in the advanced course.

In your case as you have 4 x 4 replicates you could use a combination of the replicate stats filter for a conventional statistical analysis and the intensity difference filter between the two replicate groups to determine the significant deviations from a difference from 0. Do the intensity difference filter first though since this relies on seeing the whole distribution of points.
Hi Simon,
Do you know if SeqMonk can show the exact base pairs for each reads? It will be very helpful for detecting indels and de novo mutation. Thank you!
mediator is offline   Reply With Quote
Old 03-05-2012, 12:26 PM   #68
aggp11
Member
 
Location: Wisconsin

Join Date: Jun 2011
Posts: 87
Default

Hello Simon,

Can we use SeqMonk to visualize CNVs? I know there are several tools for predicting copy number changes, but am just wondering if there is a way of visualizing these Copy Number changes using SeqMonk from NGS data.

Thanks,
Praful
aggp11 is offline   Reply With Quote
Old 03-06-2012, 12:37 AM   #69
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

Quote:
Originally Posted by mediator View Post
Hi Simon,
Do you know if SeqMonk can show the exact base pairs for each reads? It will be very helpful for detecting indels and de novo mutation. Thank you!
Sorry but no it can't. SeqMonk operates purely on mapped positions. This allows it to analyse a billion plus reads on a normal desktop PC, but does mean that there's no direct connection to the original sequences of the submitted reads. We've thought about allowing it to keep connection to the original genomic sequence (so you could for example look for trends vs specific motifs, or GC content etc.) but it's very unlikely we're ever going to add in mutation information to each read since this would kill the very optimised data model we have for storing and manipulating these reads.
simonandrews is offline   Reply With Quote
Old 03-06-2012, 12:51 AM   #70
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

Quote:
Originally Posted by aggp11 View Post
Hello Simon,

Can we use SeqMonk to visualize CNVs? I know there are several tools for predicting copy number changes, but am just wondering if there is a way of visualizing these Copy Number changes using SeqMonk from NGS data.
Hi Praful,

SeqMonk should certainly be able to do this. You'd probably want to do a simple read count over tiled probes which are large enough to contain enough data to get a reliable measure of the read depth, but small enough to catch smaller deletions. There are then a number of different tools to allow you to compare different samples and find differences between samples, or outliers from the normal coverage distribution in a single sample.

This isn't something our group works on much, but we've certainly used the program to confirm targeted knockouts that we've made, so the same principles could be used to find novel deletions or duplications.
simonandrews is offline   Reply With Quote
Old 03-22-2012, 05:38 AM   #71
pbseq
Member
 
Location: italy

Join Date: Feb 2010
Posts: 16
Default

Hi Simon,
first again lots of compliments for seqmonk, I don't feel like I can fully grasp a new RNA-seq experiment until I've viewed it in seqmonk. !

This told, I have a question, maybe trivial: is there a way to load a custom set of genes (let's say a particular class of genes) for, e.g. getting a chromosome overview of their expression and mapping over chromosomes ?

If I also can suggest an improvement, I' d like to be able to resize the sample window (e.g: If have lots of samples, I may like to focus on only one interesting sample to let also visualize fully the mapped reads; with more than 5-6 samples is hard to visualize everything and so it's better to select one or few samples (e.g. for deciphering alternative splicing claims) ... I know I can delete a sample but resizing / hiding one or more samples maybe a better solution?
thanks a lot for considering those notes !
pbseq
pbseq is offline   Reply With Quote
Old 03-22-2012, 05:57 AM   #72
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

Quote:
Originally Posted by pbseq View Post
Hi Simon,
first again lots of compliments for seqmonk, I don't feel like I can fully grasp a new RNA-seq experiment until I've viewed it in seqmonk. !
Thanks! It's always great for us to hear feedback from other people using the program.

Quote:
Originally Posted by pbseq View Post
This told, I have a question, maybe trivial: is there a way to load a custom set of genes (let's say a particular class of genes) for, e.g. getting a chromosome overview of their expression and mapping over chromosomes ?
Sure, but I guess this will depend on how your're defining your group. The method we're using most commonly is to use the fearture search tool (Edit > Find Feature) to identity a group of genes/transcripts based on their annotation. This would include things like GeneOntology terms or anything else you find in the annotation. Once you have the list of hits visible you can use the option at the bottom to turn the hits into a new annotation track. Once you have a track just containing your features of interest then you can either just quantitate over these features, or you could do a wider quantitation and then use the feature filter to pull out just the probes which overlapped with your selected set of features.

Quote:
Originally Posted by pbseq View Post
If I also can suggest an improvement, I' d like to be able to resize the sample window (e.g: If have lots of samples, I may like to focus on only one interesting sample to let also visualize fully the mapped reads; with more than 5-6 samples is hard to visualize everything and so it's better to select one or few samples (e.g. for deciphering alternative splicing claims) ... I know I can delete a sample but resizing / hiding one or more samples maybe a better solution?
I'm not sure I get what you mean here. You can remove a sample from the main chromosome view without deleting it from your project. Just go to View > Set Data Tracks and you can choose which samples you want to have visible, and in which order. The removed samples are still in your project and can be added back to the view whenever you like.

I suspect I may be missing the point you're making though.

If you're interested in looking at alternative splicing then if you haven't seen this already then a really neat option is to import just the spliced introns into your project. If you have a spliced mapped SAM/BAM file (eg from TopHat), then if you import this and select "Split Spliced Reads" and "Import Introns rather than exons" then you'll see just the splices which you've observed. You can quantitatively analyse these by using the Read Position Probe Generator followed by the Exact Overlap Count Quantitation. We've found this way of looking at the data to be really helpful in deciding if there really is a change in the splicing pattern between samples.
simonandrews is offline   Reply With Quote
Old 03-22-2012, 07:01 AM   #73
pbseq
Member
 
Location: italy

Join Date: Feb 2010
Posts: 16
Default

Thanks a lot Simon, great hints. Seqmonk has really a lot of features to explore !

pbseq
pbseq is offline   Reply With Quote
Old 03-22-2012, 01:49 PM   #74
mediator
Member
 
Location: New England

Join Date: Nov 2010
Posts: 27
Default

Hi Simon,
For bed file (generated by Scripture, from RNA-Seq data), which quantification pipeline would you recommend? I am trying to compare bed files between patients and healthy controls in order to find splice variants unique to patients. Thank you!
mediator is offline   Reply With Quote
Old 03-22-2012, 03:25 PM   #75
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

Quote:
Originally Posted by mediator View Post
Hi Simon,
For bed file (generated by Scripture, from RNA-Seq data), which quantification pipeline would you recommend? I am trying to compare bed files between patients and healthy controls in order to find splice variants unique to patients. Thank you!
I've not used scripture before, but looking at the documentation it looks like the data you get out of scripture is probably more processed than you'd want to put into SeqMonk as a data track. We'd normally import the output of Tophat into the program, either importing the spliced exonic reads, or the introns depending on what we were looking for.

From what I can see scripture tries to create assembled transcripts from your raw data, so I guess the best way to handle this would be to import it as an annotation track rather than a data track. If the features it produces are spliced then you'd need to import them as GTF or GFFv3 files since none of the other annotation formats supported by SeqMonk can handle multi-location features.

Once you have these elements in place then you could quantitate the various scripture transcripts in your datasets and then compare these. You could use the standard RNA-Seq quantitation pipeline and follow the basic RNA-Seq methodology (I'm actually in the process of producing an improved RNA-Seq guide since we have a pretty solid way of dealing with this data now).
simonandrews is offline   Reply With Quote
Old 03-22-2012, 04:30 PM   #76
mediator
Member
 
Location: New England

Join Date: Nov 2010
Posts: 27
Default

Hi Simon,
Thanks for the help! Will try that.
mediator is offline   Reply With Quote
Old 03-23-2012, 12:21 AM   #77
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

Quote:
Originally Posted by mediator View Post
Hi Simon,
Thanks for the help! Will try that.
Let me know if it works out. I can take a better look if this approach turns out not to be feasible.
simonandrews is offline   Reply With Quote
Old 03-25-2012, 02:27 PM   #78
mediator
Member
 
Location: New England

Join Date: Nov 2010
Posts: 27
Default

Hi Simon,
I tried to import the bed files as annotated track but Seqmonk could not recognize those files. I just import them as BED files, and quantify by using "RPKM calculation for RNA-Seq data". Then I filter the data by intensity difference with p=0.05 cutoff (normal vs. patients) and save the feature report. To search for splice variant, I have to open the bed files in IGV, go through the genes in the feature report one by one. Do you think there might be better solution than this? Thank you!
mediator is offline   Reply With Quote
Old 03-25-2012, 11:20 PM   #79
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

Quote:
Originally Posted by mediator View Post
Hi Simon,
I tried to import the bed files as annotated track but Seqmonk could not recognize those files. I just import them as BED files, and quantify by using "RPKM calculation for RNA-Seq data". Then I filter the data by intensity difference with p=0.05 cutoff (normal vs. patients) and save the feature report. To search for splice variant, I have to open the bed files in IGV, go through the genes in the feature report one by one. Do you think there might be better solution than this? Thank you!
Are these BED files "multi-location" BED files by any chance? If so then SeqMonk's BED parser won't recognise them. We did look at putting in support for them, but even the people who made the format were saying that they were not recommended for use and people should switch to GFFv3 or GTF.

Is there any way you could let me have a copy of your results for one experiment so I can actually see what you're working with. It's difficult to offer more useful suggestions when I can't actually see the data.
simonandrews is offline   Reply With Quote
Old 03-26-2012, 03:06 AM   #80
neurongs
Junior Member
 
Location: Alicante, Spain

Join Date: Mar 2012
Posts: 7
Default

Hi Simons,

First, I find seqmonk very interesting and I would like to thank you for the development of such an excellent tool.

I am analysing some ChIPseq datasets. I am a bit surprised since I found a slight difference between the medians reflected in the boxwhisker plot and those calculated on the report of the probeset (including unannotated probes). Do you have any possible explanation to this?

In addition, some times, the whiskers fall out of the represented scale and therefore the plot is incomplete.

Thank your in advance for your help and your time.
neurongs is offline   Reply With Quote
Reply

Tags
analysis, desktop, seqmonk, visualization

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:50 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO