Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Maker2 running for 8 days!

    Hi mates,

    Is that normal? has anyone used MAKER Pipeline?
    It's running with default parameters for almost 9 days (using 80 threads) over a scaffold file of a snake.

    I see it's now making a lot of BLASTX.


    Thanks,
    Condomitti.

  • #2
    If an analysis is running this long (and is actively creating output) then leave it alone

    It does not matter if the behavior is "normal".

    Comment


    • #3
      Run time is dependent mostly on how many sequences (transcripts and proteins) you give it.

      On 12 cores the entire mouse refseq protein database took about 3 days for me. But other parts of the pipeline weren’t being run (i.e. Augustus/SNAP and transcript alignments).

      How many sequences did you feed in the est= and protein= fields?

      Comment


      • #4
        Thanks GenoMax and Wallysb01!

        With "normal" I meant if it shouldn't be running improperly due to a misconfiguration of the pipeline or installation =).

        But based Wallysb01's numbers it seems to be running properly...

        Wallys I gave it only one big EST file and set the pipeline to use augustus / transcript alignments.
        Maybe that's the reason for such a long run time... what do you think?

        Comment


        • #5
          9 days on 80 cores sounds like a ton of time, but if that's what it needs...

          What is the file size of that est set? Because I've run fairly large transcriptome assemblies (ie total file size is about 2GB) from RNA-seq in more like a 4 days on 12 cores. Augustus shouldn't add too much time, maybe a day or two. The default is to also include TE proteins, which adds a while (for me around a day on 12 cores). You should also have some sort of masking turned on, if not done already. That will speed things up, as est alignments won't seed in repeat regions (which is probably 30%+ of the genome).

          If I were you, I'd try to get a sense of how long you think its going to take. You should be able to check the log file to see how far through your genome it is. It processes each scaffold individually from the top of your fasta file down. So if you can get the scaffold name range its on (their will be 80 its working on), that could give you a sense. And be sure to account for the size of the scaffolds its completed vs the size it still has to go. At least with my genomes, I order them largest to smallest for this very reason.

          If it were me, I'd sure hope its at least 25% of the way through the file by now. If its only say 10% of the way through, you could be looking at a 3 month run time. At that point, you are probably either racking up a lot of CPU/hour costs on a cluster, or if you don't pay because you own the nodes, you're holding back other jobs. So, you might think about reducing your est set (could look at CD-HIT or just removing some tissues/timepoints). I doubt you really need this much, as I'm sure you'll well past the point of some significant diminishing returns, but if its almost done, might as well let it go.

          Comment


          • #6
            Hi Wallysb01,

            Thanks for your help and sorry my time to reply.
            The last execution of Maker2 was interrupted by an issue in the server so I had to wait for it to be available for use again so just now I'm getting ready to restarting Maker2 execution.

            So based on what you said, should I turn masking off?
            this is what I have so far:

            model_org= #a model organism for RepBase masking in RepeatMasker
            rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker
            repeat_protein=/usr/local/src/maker/data/te_proteins.fasta #provide a fasta file of transposable element proteins for RepeatRunner
            rm_gff= #pre-identified repeat elements from an external GFF3 file
            prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no
            softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering)


            Isn't it already turned off?


            Cheers,
            Condomitti.

            Comment


            • #7
              It is off yes, but you are going to tblastn those te_proteins. If you’ve already don’t that in a previous round of maker you could grep out those lines from the gff output and use that as input instead.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              18 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              22 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              17 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              48 views
              0 likes
              Last Post seqadmin  
              Working...
              X