SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
ChIP-Seq: Enabling Data Analysis on High-Throughput Data in Large Data Depository Usi Newsbot! Literature Watch 1 04-18-2018 10:50 PM
Cufflinks - Nature Biotech data sets adrian Bioinformatics 1 04-16-2011 05:40 PM
public data sets muchomaas Bioinformatics 2 06-08-2010 02:48 AM
sff_extract: combining data from 454 Flx and Titanium data sets agroster Bioinformatics 7 01-14-2010 11:19 AM
SeqMonk - Flexible analysis of mapped reads simonandrews Bioinformatics 7 07-24-2009 05:12 AM

Reply
 
Thread Tools
Old 12-09-2013, 01:19 PM   #241
tirohia
Member
 
Location: Auckland, NZ

Join Date: Nov 2011
Posts: 46
Default

So, yes, that appears to work. Now it's a whole bunch of errors about certain reads mapping somewhere past the end of the chromosome (Reading position 14320062 was 2642bp beyond the end of chr25 (14317420)).

Not sure how that's possible given that it's all coming from the same set of data, off to check it all again though.

Cheers
Ben.
tirohia is offline   Reply With Quote
Old 12-09-2013, 01:26 PM   #242
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

Quote:
Originally Posted by tirohia View Post
So, yes, that appears to work. Now it's a whole bunch of errors about certain reads mapping somewhere past the end of the chromosome (Reading position 14320062 was 2642bp beyond the end of chr25 (14317420)).

Not sure how that's possible given that it's all coming from the same set of data, off to check it all again though.

Cheers
Ben.
If you've just built the genome from the GTF file them the chromosome will end where the last feature finishes, so you will get reads mapping off the end of the chromosome. If you want to get the chromosome lengths correct then you can either fix them manually when you build the genome or you can pass in the fasta files in addition to the GTF file to get the full chromosome length.
simonandrews is offline   Reply With Quote
Old 12-09-2013, 03:29 PM   #243
tirohia
Member
 
Location: Auckland, NZ

Join Date: Nov 2011
Posts: 46
Default

That worked perfectly and I doubt I would have thought of that straight off the bat. I'm am indebted to you

many thanks.
tirohia is offline   Reply With Quote
Old 12-10-2013, 01:05 AM   #244
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

I'm going to make up a tutorial video to show the process of making one of these custom genomes to try to explain this a bit better. It's probably not obvious from just the documentation and it's a very new feature so a few people will be trying this out.
simonandrews is offline   Reply With Quote
Old 12-10-2013, 09:04 AM   #245
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

I've uploaded a tutorial video showing how to use the custom genome builder tool in SeqMonk to easily work with genomes which aren't available in our core database. The video shows how to build both conventional genomes, but also how to make pseudo chromosomes when you're working with assemblies which are incomplete and may contain many thousands of scaffolds.
simonandrews is offline   Reply With Quote
Old 01-14-2014, 12:57 AM   #246
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

I've uploaded a new release of SeqMonk to the project web site. Version 0.27.0 makes some improvements to the RNA-Seq quantitation as well as adding a new tool which makes it easy to automatically split large numbers of samples into the appropriate data groups or replicate sets.

We've also improved the re-import tool so that you can now down-sample a large dataset to a size of your choosing, or filter the reads by their length.

Finally we've fixed a couple of bugs, notably a problem with multiple testing correction when analysing large numbers of HiC probes, and one which prevented custom genomes created from GFFv3 files from automatically loading the annotation in the new genome.

Please let us know if you have any problems with the new version.
simonandrews is offline   Reply With Quote
Old 01-14-2014, 07:43 AM   #247
crazyhottommy
Senior Member
 
Location: Gainesville

Join Date: Apr 2012
Posts: 140
Default

Thank you for developing this excellent tool!

Quote:
Originally Posted by simonandrews View Post
I've uploaded a new release of SeqMonk to the project web site. Version 0.27.0 makes some improvements to the RNA-Seq quantitation as well as adding a new tool which makes it easy to automatically split large numbers of samples into the appropriate data groups or replicate sets.

We've also improved the re-import tool so that you can now down-sample a large dataset to a size of your choosing, or filter the reads by their length.

Finally we've fixed a couple of bugs, notably a problem with multiple testing correction when analysing large numbers of HiC probes, and one which prevented custom genomes created from GFFv3 files from automatically loading the annotation in the new genome.

Please let us know if you have any problems with the new version.
crazyhottommy is offline   Reply With Quote
Old 01-24-2014, 04:19 PM   #248
mathew
Member
 
Location: australia

Join Date: Jan 2011
Posts: 81
Default CAn I save wiggle file

Hi Simon,

I ran ChIP seq analysis and created a wiggle file to look for peaks. Can I export the wiggle track file or not at all.
Thanks



Quote:
Originally Posted by simonandrews View Post
I've uploaded a new release of SeqMonk to the project web site. Version 0.27.0 makes some improvements to the RNA-Seq quantitation as well as adding a new tool which makes it easy to automatically split large numbers of samples into the appropriate data groups or replicate sets.

We've also improved the re-import tool so that you can now down-sample a large dataset to a size of your choosing, or filter the reads by their length.

Finally we've fixed a couple of bugs, notably a problem with multiple testing correction when analysing large numbers of HiC probes, and one which prevented custom genomes created from GFFv3 files from automatically loading the annotation in the new genome.

Please let us know if you have any problems with the new version.
mathew is offline   Reply With Quote
Old 01-27-2014, 12:46 AM   #249
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

Quote:
Originally Posted by mathew View Post
Hi Simon,

I ran ChIP seq analysis and created a wiggle file to look for peaks. Can I export the wiggle track file or not at all.
Thanks
Yes, you can export the data in a couple of ways.

If you want to put the data into a genome browser or a general genome visualisation tool like IGV then you can do File > Export Current View > BEDGraph, which will export the data as a BEDGraph file. We have to use BEDGraph format rather than wig since the probes in SeqMonk are not guaranteed to be fixed width.

If you want to just get the data out in a format which is easy for you to manipulate then you can create a report using the options in the report menu. Doing an annotated probe report (even if you don't actually annotate) is probably what you'd want to try first.

Hope this helps.
simonandrews is offline   Reply With Quote
Old 02-05-2014, 01:31 AM   #250
ShellfishGene
Member
 
Location: Germany

Join Date: Mar 2009
Posts: 14
Default

Quote:
Originally Posted by simonandrews View Post
The video shows how to build both conventional genomes, but also how to make pseudo chromosomes when you're working with assemblies which are incomplete and may contain many thousands of scaffolds.
Hi Simon,

when I have a genome with pseudo chromosomes, the "go to position" feature only shows pseudo chromosomes. Could that be expanded to actually allow for going to scaffold coordinates also? Or is there a work around to go to actual scaffold positions?

Cheers
ShellfishGene is offline   Reply With Quote
Old 02-05-2014, 01:37 AM   #251
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

Quote:
Originally Posted by ShellfishGene View Post
Hi Simon,

when I have a genome with pseudo chromosomes, the "go to position" feature only shows pseudo chromosomes. Could that be expanded to actually allow for going to scaffold coordinates also? Or is there a work around to go to actual scaffold positions?

Cheers
That's a really good point. I'll see if there's an easy way to add the original scaffolds into the options for that.
simonandrews is offline   Reply With Quote
Old 02-09-2014, 04:59 PM   #252
mathew
Member
 
Location: australia

Join Date: Jan 2011
Posts: 81
Default FastQc for Bisulfite seq

Two related questions:
1.Can we used FAstqc for qc check of bisulfite seq data. Will it give me CG ratios etc?
2. I have used Seq monk but have not used Bismark. Is there a GUI version Bismark equivalent to Seqmonk?
Thanks
mathew is offline   Reply With Quote
Old 02-10-2014, 12:11 AM   #253
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

Quote:
Originally Posted by mathew View Post
Two related questions:
1.Can we used FAstqc for qc check of bisulfite seq data. Will it give me CG ratios etc?
Yes. FastQC reports provide some useful information in analysing BSSeq. All of the normal quality stuff is the same as any other library, and the sequence composition parts are informative - although since BS-Seq libraries are expected to look odd you will get error flags on the composition based modules (esp per sequence composition) since there should be very few Cs and lots of Ts.

There is also a lot of more BS-Seq specific QC in the bismark reports which you should definitely look at. This will cover things like the BS strands which were found, the overall level of methylation in different contexts and positional biases in methylation (M-bias plots).

Quote:
Originally Posted by mathew View Post
2. I have used Seq monk but have not used Bismark. Is there a GUI version Bismark equivalent to Seqmonk?
Thanks
No, the nature of the computation needed for bismark means that it's not really feasible to run this on a normal desktop computer. There is a galaxy wrapper for bismark which can make it easier for people who really want to avoid the command line altogether, but the bismark command options are pretty simple for normal libraries. I suppose we could write a small graphical program to help to set this up, but it would just end up constructing a normal command line at the end.
simonandrews is offline   Reply With Quote
Old 02-24-2014, 11:18 AM   #254
Skissors
Junior Member
 
Location: Canada

Join Date: Feb 2014
Posts: 2
Red face Filtering by Statistical Test

Hi,

I used the RNA-seq pipeline in SeqMonk to process my data. My initial probe list was 26127 in size. So I performed a filter by statistical test using intensity difference. In the pop-up box where it indicates "Probes per sample", when I try to input 26127, it defaults to 13036. Is there any reason why it would default to this value?

I'm new to using SeqMonk and NGS in general, so I have much to learn.

Thanks for your help!
Skissors is offline   Reply With Quote
Old 02-24-2014, 11:26 AM   #255
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

Quote:
Originally Posted by Skissors View Post
Hi,

I used the RNA-seq pipeline in SeqMonk to process my data. My initial probe list was 26127 in size. So I performed a filter by statistical test using intensity difference. In the pop-up box where it indicates "Probes per sample", when I try to input 26127, it defaults to 13036. Is there any reason why it would default to this value?

I'm new to using SeqMonk and NGS in general, so I have much to learn.

Thanks for your help!
It should normally default to 1% of the probe set size, so even 13,000 is way too big. This value is the sample size the program uses to make up a custom distribution to judge the significance of each point - you need to make it considerably smaller than the total probe set size, and unless you have a good reason it's as well to leave it set to the default value which the program suggests.

Hope this helps
simonandrews is offline   Reply With Quote
Old 03-04-2014, 12:59 PM   #256
edere
Junior Member
 
Location: US

Join Date: Feb 2014
Posts: 4
Default

Hi,

I've been using Seqmonk to process and visualize my RRBS data. The data I've imported has replicates for treatment and control conditions, and I want to export a list of all the probes (CpGs) and their q-values to plot the distribution of q-values. I've tried filtering my data using the Replicate Set Stats and setting the p-value cutoff to 1 but this generates a list of the probes with p-values < 1.

Is there anyway a full list of probes and q-values?

Thanks
edere is offline   Reply With Quote
Old 03-05-2014, 12:43 AM   #257
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

Quote:
Originally Posted by edere View Post
Hi,
I've been using Seqmonk to process and visualize my RRBS data. The data I've imported has replicates for treatment and control conditions, and I want to export a list of all the probes (CpGs) and their q-values to plot the distribution of q-values. I've tried filtering my data using the Replicate Set Stats and setting the p-value cutoff to 1 but this generates a list of the probes with p-values < 1.
In all of the statistical tests in SeqMonk you should be able to set the cutoff to 1 to generate a full list of hits regardless of significance. I just tried this in a few tests and whilst it worked OK in most of them, it failed for the replicate set stats in a few cases. Looking at the cases where it failed it was probes where all of the values in all cases were exactly the same, so there was no absolute difference and no variance. This is probably a corner case which produces an infinite value which doesn't get caught and I'll look into that.

Since you're trying to run this on RRBS data then you're much more likely to hit this condition since I guess it will be fairly common to have repeated methylation values (especially 0 or 100%). Quantitative tests such as the replicate set stats aren't really suitable for this kind of data unless you have huge numbers of replicates (central limit theorem and all that). We'd normally use the contingency based tests (Chi-Square in SeqMonk) to do a count based significance assessment, along with a subsequent filter for a sensible level of absolute difference.
simonandrews is offline   Reply With Quote
Old 03-05-2014, 06:53 AM   #258
edere
Junior Member
 
Location: US

Join Date: Feb 2014
Posts: 4
Default

I'm not exactly sure how to run the Chi-squared test in Seqmonk. When I add the two data stores that I want to compare, in this case they are my replicate sets for my control and treatment conditions, they get populated into the "Pairs" box, but I can't run the filter.

Clearly I haven't done something correctly. How do I set up the Chi-squared test to compare the my treatment replicates against the control replicates for each probe (cytosine)?


Quote:
Originally Posted by simonandrews View Post
In all of the statistical tests in SeqMonk you should be able to set the cutoff to 1 to generate a full list of hits regardless of significance. I just tried this in a few tests and whilst it worked OK in most of them, it failed for the replicate set stats in a few cases. Looking at the cases where it failed it was probes where all of the values in all cases were exactly the same, so there was no absolute difference and no variance. This is probably a corner case which produces an infinite value which doesn't get caught and I'll look into that.

Since you're trying to run this on RRBS data then you're much more likely to hit this condition since I guess it will be fairly common to have repeated methylation values (especially 0 or 100%). Quantitative tests such as the replicate set stats aren't really suitable for this kind of data unless you have huge numbers of replicates (central limit theorem and all that). We'd normally use the contingency based tests (Chi-Square in SeqMonk) to do a count based significance assessment, along with a subsequent filter for a sensible level of absolute difference.
edere is offline   Reply With Quote
Old 03-24-2014, 02:19 AM   #259
Aspadia
Junior Member
 
Location: Europe

Join Date: Aug 2013
Posts: 4
Default ChIA-PET Hi-C maps

Hi Simon,

I have been using Seqmonk succesfully for analysis of ChIP-seq data and now I have ChIA-PET data to analyse. I have the reads mapped in a BAM file, I have seperate BAM files for the different linker combinations generated by ChIA-PET. I manage to visualise them in a Hi-C heatmap in Seqmonk but got a bit worried here because there is some statistics applied before the heatmap is generated. I think the statistics used for Hi-C experiments vs statistics for ChIA-PET experiments must be different as ChIA-PET is a ChIP (and therefore enriched for loci to begin with) derived method. I have been looking for the exact statistics but I cannot find them anywhere, only a brief mentioning in one of the tutorial video's. Can you tell me where I can find what statistics are applied? Is there any option to visualize the data in a heatmap without applying the statistics?
I would also like to substract interactions found for one linker combination from interactions found for another linker combination. Is this possible in Seqmonk?
Thank you very much for your help!
Aspadia is offline   Reply With Quote
Old 03-28-2014, 04:29 AM   #260
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

Sorry to take so long to reply to you - I'm a bit behind on my mails right now!

The statistical model for HiC in SeqMonk is based on the expectation that random interactions should be distributed in line with the level of fragment end coverage within each region, so that regions which have a large number of fragment ends in them should also be more likely to form part of any random interactions. This works well for methods where you expect fairly even fragment end coverage (such as normal HiC) but is less good where you have uneven distributions (any kind of intentional enrichment). In these cases the method still works, but if your enrichment is very strong then you may find that you get highly significant results from small numbers of interactions with generally depleted regions, which may not be biologically sensible.

If you want to view the heatmap without applying any filtering then you should be able to set the cutoffs to not filter (p<1 and diff >0). I might be mis-remembering but I may have changed some of this code recently to allow this kind of construction (I think the filters may have been restricted to showing only interactions with Obs/Exp > 1 whatever you selected). You could try the development snapshot at http://www.bioinformatics.babraham.a...28.0_devel.zip which has some fixes and other HiC improvements which might be useful.
simonandrews is offline   Reply With Quote
Reply

Tags
analysis, desktop, seqmonk, visualization

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:29 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO