Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to increase sequence limit in SignalP

    I'm using SignalP in my local computer. One thing that I've encountered is 'Sequence limit reached: Max 10000 sequences are allowed'. I've read through the instructions, and did not find the command to increase the sequence limit.
    Is there a way to increase the sequence limit? Thank you very much!

  • #2
    Which version of SignalP are you using?

    Comment


    • #3
      One option is to wrap SignalP (and companion programs) in a script that splits files for input, and concatenates output from several runs, e.g.

      run_signalp.py

      run_tmhmm.py

      These two scripts also take advantage of multiple cores, where possible.

      Comment


      • #4
        Some of this family of tools have the sequence limit as a setting inside the Perl wrapper script (which is why I was checking which version you are using).

        But as Leighton points out, the more practical route is to take advantage of the embarrassingly parallel nature of the task and split the FASTA file and run multiple copies in parallel (on the same machine or a cluster).

        We used the same trick in our SignalP and TMHMM wrappers for Galaxy http://toolshed.g2.bx.psu.edu/view/p...mm_and_signalp which we talk about a little in the accompanying paper: http://dx.doi.org/10.7717/peerj.167

        Comment


        • #5
          Originally posted by maubp View Post
          Some of this family of tools have the sequence limit as a setting inside the Perl wrapper script (which is why I was checking which version you are using).

          But as Leighton points out, the more practical route is to take advantage of the embarrassingly parallel nature of the task and split the FASTA file and run multiple copies in parallel (on the same machine or a cluster).

          We used the same trick in our SignalP and TMHMM wrappers for Galaxy http://toolshed.g2.bx.psu.edu/view/p...mm_and_signalp which we talk about a little in the accompanying paper: http://dx.doi.org/10.7717/peerj.167
          The version I'm using is signalp-4.1. I'm reading your paper. It looks awesome. I might have some further questions about the Galaxy.

          Comment


          • #6
            Originally posted by LeightonP View Post
            One option is to wrap SignalP (and companion programs) in a script that splits files for input, and concatenates output from several runs, e.g.

            run_signalp.py

            run_tmhmm.py

            These two scripts also take advantage of multiple cores, where possible.
            Thank you so much. Can you give some direction on how to wrap signalp with the script you provided?

            Comment


            • #7
              Originally posted by JackMetal View Post
              Thank you so much. Can you give some direction on how to wrap signalp with the script you provided?
              It's pretty straightforward

              1) Make sure SignalP is installed and working on your system.
              2) Run the script.

              run_signalp.py [-o|--outfilename <output file>] <euk|gram+|gram-><FASTAfile>

              If you didn't make the script executable, then use "python run_signalp.py etc."

              Comment


              • #8
                Originally posted by LeightonP View Post
                It's pretty straightforward

                1) Make sure SignalP is installed and working on your system.
                2) Run the script.

                run_signalp.py [-o|--outfilename <output file>] <euk|gram+|gram-><FASTAfile>

                If you didn't make the script executable, then use "python run_signalp.py etc."
                Thanks. But when I run the script, it seems that I dont have all these modules, Bio, multiprocessing, os, sys etc. Does it mean I need to install all these modules listed to run the script.

                Comment


                • #9
                  Originally posted by JackMetal View Post
                  Thanks. But when I run the script, it seems that I dont have all these modules, Bio, multiprocessing, os, sys etc. Does it mean I need to install all these modules listed to run the script.
                  Bio is the Biopython library, which you would need to install.

                  However, os, and sys are core Python modules in the standard library - and if they are missing your Python is very very broken. The multiprocessing library is also a core library, but only included in Python 2.6 onwards.

                  Comment


                  • #10
                    You can also edit the signalP code to allow you more sequences. If your computer can chew on it. Just edit the signalP file (around line 21) to some number of your choice:
                    Code:
                    # max number of sequences per run (any number can be handled)
                    my $MAX_ALLOWED_ENTRIES=1000;

                    Comment


                    • #11
                      How can you do the same for SecretomeP ?

                      The reason I ask is because SecretomeP also runs SignalP. Very practical.
                      Last edited by sindrle; 02-26-2014, 09:01 AM.

                      Comment


                      • #12
                        j.provaz is right. Just editing the line "my $MAX_ALLOWED_ENTRIES=2000000;" does the trick.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Essential Discoveries and Tools in Epitranscriptomics
                          by seqadmin




                          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                          04-22-2024, 07:01 AM
                        • seqadmin
                          Current Approaches to Protein Sequencing
                          by seqadmin


                          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                          04-04-2024, 04:25 PM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, Yesterday, 11:49 AM
                        0 responses
                        15 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-24-2024, 08:47 AM
                        0 responses
                        16 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-11-2024, 12:08 PM
                        0 responses
                        62 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 10:19 PM
                        0 responses
                        60 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X