Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    SeqMonk v0.14.0 has just been released. This adds a few new features and squishes some bugs.

    New features include:
    • A new cumulative distribution plot which allows you to compare the whole distribution of quantitated values for several data stores or probe lists.
    • A new secondary quantitation method - the percentile normalisation quantitation allows you to take an existing set of quantitated values and normalise them to a particular point in their distribution. This would be useful in cases where the existing option to normalise to total read count does not produce an acceptable match between the distributions across your data stores.
    • The annotation readers now allow the import of multiple files in the same operation. Newly imported annotation tracks are now displayed immediately by default
    • When using the generic text import for annotation data you can now manually specify a feature type rather than having to have this in the file, or simply using the file name
    • A scale bar has been added to the genome view (following a suggestion on SeqAnswers)


    Amongst the squished bugs were an SVG export corruption problem, crashes when encountering unexpected folders in the genome folder and a hang when normalising the line graph.

    You can get SeqMonk from:

    Comment


    • #17
      SeqMonk seems an excellent and user-friendly program and I would like to use it. However, our reference sequence is made of a collection of full-length cDNAs and not a genome. That is the reference sequence against which the mapping of our Solexa tags has been done. I am aware that one can format any custom genome in a compatible way for SeqMonk, but is it possible to use SeqMonk using custom cDNA reference sequences instead? What alternative package for analyzing and visualizing the data can be recommended in this case? Thank you.
      By the way, SeqMonk is not starting on my windows XP machine. When I double-click the .bat file, the DOS windows opens for a fraction of a second and then it closes immediately. Nothings seems to be happening. I do have the latest SeqMonk v. 0.14. and Java environment installed. (SeqMonk is starting fine on my Mac though). Any ideas about what could be wrong? Thanks again.

      Comment


      • #18
        Originally posted by psabelli View Post
        SeqMonk seems an excellent and user-friendly program and I would like to use it. However, our reference sequence is made of a collection of full-length cDNAs and not a genome. That is the reference sequence against which the mapping of our Solexa tags has been done. I am aware that one can format any custom genome in a compatible way for SeqMonk, but is it possible to use SeqMonk using custom cDNA reference sequences instead? What alternative package for analyzing and visualizing the data can be recommended in this case? Thank you.
        I'm actually in the process of working with just this kind of data! SeqMonk wasn't really designed with this in mind, but you can make a pseudo genome out of shorter contigs where you concatonate them into groups of a few thousand. It's not ideal but if you want to have a go then I'm happy to share the code I've written for my job.

        Originally posted by psabelli View Post
        By the way, SeqMonk is not starting on my windows XP machine. When I double-click the .bat file, the DOS windows opens for a fraction of a second and then it closes immediately. Nothings seems to be happening. I do have the latest SeqMonk v. 0.14. and Java environment installed. (SeqMonk is starting fine on my Mac though). Any ideas about what could be wrong?
        If it doesn't start at all then it's normally one of two things;
        1. Java isn't installed properly, or the java binary isn't in your path. Open a command prompt and type 'java -version' if you get an error saying this isn't a recognised command then this is the problem
        2. You don't have enough RAM in your machine to run the default configuration. SeqMonk ships with a configuration which assumes you have 2GB RAM. If you have less than that you can still run the program for smaller datasets but you'll need to change the memory settings.


        If it's neither of these things then try starting seqmonk from a command prompt (move to the seqmonk directory and just run the bat file directly from the command line). It will still fail to launch but should leave a useful error in the window which if you post it I can see what's going wrong.

        Comment


        • #19
          Originally posted by simonandrews View Post
          If you're working on a genome which is present in Ensembl (any of the subsections) then just let me know which one it is and I'll add it to the official repository.

          If you want to process it yourself then you can adapt the BioPerl script we use for making the main repositories. The script is in the 'Scripts' directory at the top level of the SeqMonk installation. You can use the basic structure but just rip out the EnsemblAPI stuff. The basic idea is:
          • Create a sequence object representing a chromosome
          • Read in a list of features for that chromosome (from the GTF file in your case)
          • Write the object out as an EMBL file
          • Remove the sequence part to save on space (optional)
          Hi Simon,
          SeqMonk is awesome, but I don't code in Perl so cannot make a new genome as you guided. Are there any lucks you could add a new function in SeqMonk to allow users to build a new genome from the GTF file? I am using the human reference hg18 & hg19 downloaded from UCSC GB.
          Thanks,
          Nguyen

          Comment


          • #20
            Originally posted by ttnguyen View Post
            Hi Simon,
            SeqMonk is awesome, but I don't code in Perl so cannot make a new genome as you guided. Are there any lucks you could add a new function in SeqMonk to allow users to build a new genome from the GTF file? I am using the human reference hg18 & hg19 downloaded from UCSC GB.
            Thanks,
            Nguyen
            Those genomes are already present in our repositories - we just use the Ensembl rather than UCSC nomenclature. hg18=NCBI36 hg19=GRcH37

            I think I'm right in saying that we can't automatically build a genome file from GTF files since they don't contain the length of the chromosome, so we can't work out how much sequence is left after the last gene finishes. (I'm happy to be corrected if this isn't true).

            Comment


            • #21
              Is it true that Ensembl have some versions of GRCh37 and the latest now is GRCh37.61?
              I've just checked the difference between GRCh37.61 and hg19 and found that there are some differences in chromosome Y and MT.

              I am thinking if I can create a 'genome' from GTF + chromosome length, so I can use different sources of genes annotation?

              Comment


              • #22
                Originally posted by ttnguyen View Post
                Is it true that Ensembl have some versions of GRCh37 and the latest now is GRCh37.61?
                I've just checked the difference between GRCh37.61 and hg19 and found that there are some differences in chromosome Y and MT.

                I am thinking if I can create a 'genome' from GTF + chromosome length, so I can use different sources of genes annotation?
                There are two things here, the genome assembly and the annotation set. UCSC don't really distinguish these - they just refer to hg18 or mm9 which are a combination of assembly and annotation. I presume they update their annotation sets through the life of an assembly but don't specifically advertise this in their nomenclature.

                Ensembl specifically separate the two things, so their current human version is based on the GRCh37 assembly but with an annotation set from ensembl release 61. The release numbers relate to ensembl releases which occur across all of their genomes, so sometimes a particular genome will get updated annotation, but often it won't. This means that every GRCh37 genome will be the same underlying sequence and there's no need to remap data between these releases.

                As for the differences you saw between hg19 and GRCh37 - for the Y chromosome Ensembl mask out the pseudo-autosomal region which would otherwise produce a huge stretch of exact identity with ChrX. The coordinates which remain should match between the two assemblies and arguably you should be mapping against the masked version.

                I'm not sure about the mitochondrial sequence - that may not be part of the official genome assembly in the first place so might differ slightly.

                Comment


                • #23
                  Thanks for the info

                  Originally posted by simonandrews View Post
                  I'm actually in the process of working with just this kind of data! SeqMonk wasn't really designed with this in mind, but you can make a pseudo genome out of shorter contigs where you concatonate them into groups of a few thousand. It's not ideal but if you want to have a go then I'm happy to share the code I've written for my job.

                  If it doesn't start at all then it's normally one of two things;
                  1. Java isn't installed properly, or the java binary isn't in your path. Open a command prompt and type 'java -version' if you get an error saying this isn't a recognised command then this is the problem
                  2. You don't have enough RAM in your machine to run the default configuration. SeqMonk ships with a configuration which assumes you have 2GB RAM. If you have less than that you can still run the program for smaller datasets but you'll need to change the memory settings.


                  If it's neither of these things then try starting seqmonk from a command prompt (move to the seqmonk directory and just run the bat file directly from the command line). It will still fail to launch but should leave a useful error in the window which if you post it I can see what's going wrong.

                  Thanks for your help Simon - I appreciate it. As far as the first issue (using cDNA reference sequences) is concerned, I might try building a pseudogenome as you suggested to do the alignment of the tags, and in that case I'll contact you again. However, right now mapping is a bit of a bottleneck for us, and if we are going to do a brand new mapping we might opt for using the genome instead.
                  On to the second issue, SeqMonk did not start in Windows, I solved it by lowering the memory requirements to 1GB, as you indicated. (By the way, it seemed that by default my version of SeqMonk was set up at 1.5GB of memory.) Although I have 4GB installed, the available memory is much less and I might need to free some for SeqMonk to run properly. Thanks. Paolo

                  Comment


                  • #24
                    Originally posted by psabelli View Post
                    On to the second issue, SeqMonk did not start in Windows, I solved it by lowering the memory requirements to 1GB, as you indicated. (By the way, it seemed that by default my version of SeqMonk was set up at 1.5GB of memory.) Although I have 4GB installed, the available memory is much less and I might need to free some for SeqMonk to run properly. Thanks. Paolo
                    The 1.5GB in the config file is right for 2GB total usage. The config file specifies the amount of memory the program can use, but there is an overhead for the java virtual machine which runs the program - we reckon that 1.5GB of program memory ends up using around 2GB.

                    Glad you've managed to get everything running though.

                    Comment


                    • #25
                      SeqMonk v0.15.0 has just been released.

                      This release adds some new tools which are useful for the analysis of differential splicing. These are:
                      1. An option to import introns from spliced SAM/BAM files
                      2. A probe generator to put probes over every different read position in a dataset
                      3. A quantitation method to count exact overlaps between probes and reads


                      Using this combination of tools you can get a count of the number of times a particular splice junction was used in a dataset, and can then use the existing tools to compare these counts between different datasets.

                      In addition other changes in this release are:
                      • A change to the way empty probes are handled in log transformed quantitations
                      • A new probe generator which can deduplicate and merge overlapping probes in an existing probe set
                      • An option to import features in GFFv3 or GTF format
                      • An option to create probe trends where each probe gets the same weight in the final trend plot
                      • An option to zoom in in histogram plots


                      You can get the new release from the project website at:

                      http://www.bioinformatics.bbsrc.ac.uk/projects/seqmonk/

                      [If you don't see the option to download v0.15.0 press shift+refresh in your browser to force our cache to give you the latest version]

                      Comment


                      • #26
                        SeqMonk v0.16.0 has just been released onto our website.

                        This adds a wrapper script for linux users which makes it easy to launch the program and saves the bother of having to construct your own launch command.

                        Improvements have also been made to the probe trend plot and the percentile normalisation quantitation.

                        We've also (finally!) put in place a work round for the problem where paired end read data from tophat files could not be imported due to a missing field in their BAM/SAM files.

                        You can get SeqMonk from:

                        http://www.bioinformatics.bbsrc.ac.uk/projects/seqmonk/

                        Comment


                        • #27
                          Seqmonk probe trend plots

                          I think Seqmonk is really great to visualize and analyze mapped read data, all in one package! (btw: are there any recent papers that used/cited Seqmonk for their analysis?)

                          I'm currently trying to make sense out of ChIP-seq data of a particular histone modification. Actually I have two different conditions with three replicates each that I want to compare in quantitative way.

                          So here's my question to the community:
                          Is it feasible to make a quantitative statement between the two groups in a sense like: "is there more (or less) of this histone modification in one or the other condition in a particular genomic region (like genes, promoters, etc) ?
                          For example, using Seqmonk, I created probes over each gene (incl. their promoter) and looked at the probe trend plot. At the moment I'm a bit stuck at which of the probe trend plot types/options t use for this kind of question. I reasoned that the cumulative type is more suitable 'cause it has an option to correct for total read count (which is indispensable for a quantitative statement, I guess?). I tried to look at the Seqmonk help for these options, but have to admit that it confused me a little... (esp. the sentence "A relative distribution plot will weight each probe equally in the final profile, whereas the cumulative count plot will weight the probes according to the number of bases of read falling into each probe. The cumulative count plot is more susceptible to high read count outliers skewing your result, but will give you results in real read depths")
                          What is your advice?
                          Thanks very much in advance!

                          Comment


                          • #28
                            I just came across another problem: Is it possible to export imported data from Seqmonk back to BED or any other format (i.e. the format that it was imported from?)

                            Thx,

                            Nrmncr

                            Comment


                            • #29
                              Originally posted by Neuromancer View Post
                              I just came across another problem: Is it possible to export imported data from Seqmonk back to BED or any other format (i.e. the format that it was imported from?)
                              You can export quantitated data from SeqMonk as BedGraph files and you can take tabulated data in the various reports which it offers.

                              There's no current option to export out raw data. SeqMonk only stores the sets of mapped positions (not the sequence or qualities), so it would only be possible to export out a limited set of data if we were to add that ability.

                              I suppose the question would be why you wanted to export the raw data back out of SeqMonk rather than just use the files which you imported from in the first place?

                              Comment


                              • #30
                                Thanks, that is more or less what I needed. Thanks!

                                Any suggestions about my former question about the probe trend plot the quantitation?

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM
                                • seqadmin
                                  The Impact of AI in Genomic Medicine
                                  by seqadmin



                                  Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                                  02-26-2024, 02:07 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 03-14-2024, 06:13 AM
                                0 responses
                                33 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-08-2024, 08:03 AM
                                0 responses
                                72 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-07-2024, 08:13 AM
                                0 responses
                                81 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-06-2024, 09:51 AM
                                0 responses
                                68 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X