Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • NGS functionality for KNIME

    Hi,

    I am very excited that I can share now with you my nodes and workflows that I created for NGS data analysis in KNIME.

    KNIME is a workflow management system. Some of its features include:

    * can handle many millions of rows on a desktop computer
    * workflows can be executed from the command line
    * integration with Galaxy/Mobyle possible
    * workflows can be exchanged
    * writing new functionality is relatively easy
    * it is based on JAVA/Eclipse
    * command line scripts can be organized
    * no worries about naming intermediate files
    * high content/ high through put problems can be already solved
    * scripting in R, Perl, Python, Java, Matlab supported
    * Hilighting/brushing supported
    * open source
    * commercial support available if desired
    * support for statistics, flow control (if/while loops)
    * supportive community
    * creating professional looking reports


    and now also:

    * Reading / writing FastQ /SAM/BAM /BEDgraph files
    * region of interest related tools
    * AdapterRemoval
    * and many more....


    check it out and let me know what you think....
    Installing KNIME:


    Installing community nodes:
    Unauthorized Access Unfortunately, you are not allowed to access the requested page. Please login first and try it again. Don't hesitate to contact us if you feel that you should have access to this content. User login page  


    To get a quick overview of how to use it with NGS data:
    NGS nodes and descriptions:
    Leverage an ever-growing library of extensions, join the global community, contribute to data-driven innovation.



    Kind regards,

    Bernd

  • #2
    Looks interesting. Have you used this a lot already ? Which use cases? How easy is it to install ? Is there much memory overhead?

    Comment


    • #3
      I am using it for production in our NGS service facility. We are mainly concerned with anything but resequencing and SNPs.
      It is very good for prototyping and then moving to production for tasks like preprocessing removing parts of a sequence, splitting, joining, stats.
      There is actually negative memory overhead as KNIME stores tables on disk. So there is some overhead in compute time, but we are working on this.
      You can easily parallelize things by building workflows that run in parallel.
      Once you reduced your data set to something in the range of a few million you can easily work with it (or at least that is what I am doing). For data sets bigger than this it might be used from the command line or using command line executions from within kNIME...
      Well, just give it a try and let me know if you run into problems.
      Btw, installation is fairly easy... There are instructions on the web site, basically you have to unpack and start the application, then configure a proxy if necessary and install the additional nodes. Follow the links I provided

      Comment


      • #4
        One example on the memory:
        I am currently running 6 nodes for reading BAM files in parallel, each table consists of some 300 M rows. The memory footprint is about 5 GB though I allocated 16 GB on a Linux machine...

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Today, 11:49 AM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 08:47 AM
        0 responses
        16 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        61 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        60 views
        0 likes
        Last Post seqadmin  
        Working...
        X