Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Trinity fails to finish

    Hi

    I want to de-novo assemble 60 million PE RNA-seq reads with Trinity (with ActivePerl installation in my home directory). I tried it two times but Trinity does not finish.

    The command was:

    nohup Trinity --seqType fq --max_memory 80G --left H2_rnaseq_R1_trimmed.fastq.gz --right H2_rnaseq_R2_trimmed.fastq.gz --CPU 10 --jaccard_clip &

    Trinity starts just fine and runs for several hours writing around 44GB of intermediate results to disk. After that Trinity stops and the last entry from nohup.out is:



    ---------------------------------------------------------------
    -------------------- Butterfly --------------------------------
    -- (Reconstruct transcripts from reads and de Bruijn graphs) --
    ---------------------------------------------------------------

    Butterfly_cmds: /scratchsan/kai/rnaseq/trinity_out_dir/read_partitions/Fb_0/CBin_0/c27478.trinity.reads.fa.out/chrysalis/butterfly_commands
    Inchworm and Chrysalis complete. Butterfly commands to execute are provided here:
    /scratchsan/kai/rnaseq/trinity_out_dir/read_partitions/Fb_0/CBin_0/c27478.trinity.reads.fa.out/chrysalis/butterfly_commands

    Thursday, February 12, 2015: 02:50:35 CMD: /homes/biertank/kai/software/trinityrnaseq-2.0.2/trinity-plugins/parafly/bin/ParaFly -c /scratchsan/kai/rnaseq/trinity_out_dir/read_partitions/Fb_0/CBin_0/c27478.trinity.reads.fa.out/chrysalis/butterfly_commands -shuffle -CPU 1 -failed_cmds failed_butterfly_commands.41256.txt -v
    Number of Commands: 1

    succeeded(1) 100% completed.

    All commands completed successfully. :-)

    CMD finished (0 seconds)
    Thursday, February 12, 2015: 02:50:35 CMD: /homes/biertank/kai/software/trinityrnaseq-2.0.2/util/support_scripts/print_butterfly_assemblies.pl /scratchsan/kai/rnaseq/trinity_out_dir/read_partitions/Fb_0/CBin_0/c27478.trinity.reads.fa.out/chrysalis/component_base_listing.txt > Trinity.fasta.tmp
    CMD finished (0 seconds)
    Fully cleaning up.


    ###################################################################
    Butterfly assemblies are written to /scratchsan/kai/rnaseq/trinity_out_dir/read_partitions/Fb_0/CBin_0/c27478.trinity.reads.fa.out.Trinity.fasta
    ###################################################################



    succeeded(27478), failed(1) 100% completed.

    We are sorry, commands in file: [FailedCommands] failed. :-(

    Error, cmd: /homes/biertank/kai/software/trinityrnaseq-2.0.2/trinity-plugins/parafly/bin/ParaFly -c recursive_trinity.cmds -CPU 10 -v died with ret 256 at /homes/biertank/kai/software/trinityrnaseq-2.0.2/Trinity line 1997.

    Trinity run failed. Must investigate error above. at /homes/biertank/kai/software/trinityrnaseq-2.0.2/Trinity line 1057.
    The file "FailedCommands" just states:

    /homes/biertank/kai/software/trinityrnaseq-2.0.2/util/support_scripts/../../Trinity --single "/scratchsan/kai/rnaseq/trinity_out_dir/read_partitions/Fb_0/CBin_0/c9162.trinity.reads.fa" --output "/scratchsan/kai/rnaseq/trinity_out_dir/read_partitions/Fb_0/CBin_0/c9162.trinity.reads.fa.out" --CPU 1 --max_memory 1G --full_cleanup --seqType fa --trinity_complete --jaccard_clip
    Any advice is welcome. Thanks in advance!

  • #2
    How much memory do you have on this machine? Does /scratchsan partition have plenty of space (and you are not running into any disk quotas)?

    Comment


    • #3
      It's was quite common (I used Trinity a lot 2-3 years ago) that some large component Butterfly commands failed if you ran it with low resources. Take the command that failed from the "FailedCommands" file and increase the available memory for it (e.g. change "--max_memory 1G" to "--max_memory 8G" or whatever your system could handle). Once you've succeded with these, re-run the Trinity command and it should pick up where it failed.

      Comment


      • #4
        Should you still fail, join the trinity mailing list and ask there - Brian is typically very fast in answering on the list.

        Comment


        • #5
          I am connected to this machine via SSH. It has 1TB of RAM so I could increase that. But shouldn't my --max_memory 80G be more than enough for 60 million reads?

          The "FailedCommand" assigned only 1G that's right, but that was automatically set by Trinity in the course of the run. How could I change that?

          I was also thinking about space restrictions of my account on the scratchsan partition, because both times Trinity stopped after having written around 44GB of data. I wrote an email to the admin if I have some administrative restrictions in my folder. But no answer yet.

          However, the Trinity data is still in my folder and I can work in it and copy lots of data.

          Could it be that the file system has problems with the huge amount of files in only one folder in "/trinity_out_dir/read_partitions/Fb_0/CBin_0/"?

          But this should then be a general problem of Trinity and not be specific to this machine?

          Comment


          • #6
            I don't think you can change the 1G memory nor, do I suspect, you should need to do so.

            The failed command is not a normal 'butterfly' one but rather it looks like a final cleanup command. My suspicion is that the indeed you have some sort of the file or space limitation.

            What happens when you run the failed command directly? I.e., do not start up Trinity and let it run but just run the command. It should give a better message than 'ret 256'

            Comment


            • #7
              Originally posted by balaena View Post
              I am connected to this machine via SSH. It has 1TB of RAM so I could increase that. But shouldn't my --max_memory 80G be more than enough for 60 million reads?
              If you have 1 TB of memory, why don't use it? 80 GB sounds a bit low, I'd give at least 200 GB to be on the safe side. The actual memory needed depends on the transcriptome complexity and sequencing error rate, which only partly correlates with the number of input reads.

              Originally posted by balaena View Post
              The "FailedCommand" assigned only 1G that's right, but that was automatically set by Trinity in the course of the run. How could I change that?
              Trinity parallelizes subjobs and gives them a certain slice of memory each (1 GB in this case). You need to repeat that single command with a higher limit - execute the command from the FailedCommand file, changing the memory parameter as I previously described; e.g. try

              Code:
              /homes/biertank/kai/software/trinityrnaseq-2.0.2/util/support_scripts/../../Trinity --single "/scratchsan/kai/rnaseq/trinity_out_dir/read_partitions/Fb_0/CBin_0/c9162.trinity.reads.fa" --output "/scratchsan/kai/rnaseq/trinity_out_dir/read_partitions/Fb_0/CBin_0/c9162.trinity.reads.fa.out" --CPU 1 --max_memory 16G --full_cleanup --seqType fa --trinity_complete --jaccard_clip
              directly on the command line before restarting the complete Trinity job.

              Originally posted by balaena View Post

              I was also thinking about space restrictions of my account on the scratchsan partition, because both times Trinity stopped after having written around 44GB of data. I wrote an email to the admin if I have some administrative restrictions in my folder. But no answer yet.

              However, the Trinity data is still in my folder and I can work in it and copy lots of data.

              Could it be that the file system has problems with the huge amount of files in only one folder in "/trinity_out_dir/read_partitions/Fb_0/CBin_0/"?

              But this should then be a general problem of Trinity and not be specific to this machine?
              Trinity does create a lot of files. But if you are still able to create files, I don't think that your quota is the current issue.

              Comment


              • #8
                I am going to disagree with sarvidsson and then agree with him.

                Despite being able to create files I still think that you have a file/space issue due to Trinity creating large files. I don't think that it is memory related although changing the 1G parameter will not hurt.

                I certainly agree with sarvidsson that you should run the offending command directly from the command line.

                Comment


                • #9
                  I am going to vote for 80G being too small for this job. Ask the admins to see if they find any evidence of your job running up against that limit.

                  Comment


                  • #10
                    OK, I have no storage limitations and there is plenty of space...

                    If I execute in the trinity_out_dir (where the .cmds resides):

                    /homes/biertank/kai/software/trinityrnaseq-2.0.2/trinity-plugins/parafly/bin/ParaFly -c recursive_trinity.cmds -CPU 10
                    this happens (last lines of output):

                    #######################################################################
                    Inchworm file: /scratchsan/kai/rnaseq/trinity_out_dir/read_partitions/Fb_0/CBin_0/c9162.trinity.reads.fa.out/inchworm.K25.L25.DS.fa detected.
                    Skipping Inchworm Step, Using Previous Inchworm Assembly
                    #######################################################################

                    ###### WARNING: /scratchsan/kai/rnaseq/trinity_out_dir/read_partitions/Fb_0/CBin_0/c9162.trinity.reads.fa.out/inchworm.K25.L25.DS.fa.clipped.fa already exists, skipping the jaccard-clip step, using already existing output: /scratchsan/kai/rnaseq/trinity_out_dir/read_partitions/Fb_0/CBin_0/c9162.trinity.reads.fa.out/inchworm.K25.L25.DS.fa.clipped.fa
                    -skipping cmd: /homes/biertank/kai/software/trinityrnaseq-2.0.2/Chrysalis/GraphFromFasta -i /scratchsan/kai/rnaseq/trinity_out_dir/read_partitions/Fb_0/CBin_0/c9162.trinity.reads.fa.out/inchworm.K25.L25.DS.fa.clipped.fa -r single.fa -min_contig_length 200 -min_glue 2 -glue_factor 0.05 -min_iso_ratio 0.05 -t 1 -k 24 -kk 48 > /scratchsan/kai/rnaseq/trinity_out_dir/read_partitions/Fb_0/CBin_0/c9162.trinity.reads.fa.out/chrysalis/GraphFromIwormFasta.out, checkpoint exists.
                    -skipping cmd: /homes/biertank/kai/software/trinityrnaseq-2.0.2/Chrysalis/CreateIwormFastaBundle -i /scratchsan/kai/rnaseq/trinity_out_dir/read_partitions/Fb_0/CBin_0/c9162.trinity.reads.fa.out/chrysalis/GraphFromIwormFasta.out -o /scratchsan/kai/rnaseq/trinity_out_dir/read_partitions/Fb_0/CBin_0/c9162.trinity.reads.fa.out/chrysalis/bundled_iworm_contigs.fasta -min 200, checkpoint exists.
                    -skipping cmd: /homes/biertank/kai/software/trinityrnaseq-2.0.2/Chrysalis/ReadsToTranscripts -i single.fa -f /scratchsan/kai/rnaseq/trinity_out_dir/read_partitions/Fb_0/CBin_0/c9162.trinity.reads.fa.out/chrysalis/bundled_iworm_contigs.fasta -o /scratchsan/kai/rnaseq/trinity_out_dir/read_partitions/Fb_0/CBin_0/c9162.trinity.reads.fa.out/chrysalis/readsToComponents.out -t 1 -max_mem_reads 10000000 , checkpoint exists.
                    -skipping cmd: /usr/bin/sort --parallel=2 -T . -S 1G -k 1,1n /scratchsan/kai/rnaseq/trinity_out_dir/read_partitions/Fb_0/CBin_0/c9162.trinity.reads.fa.out/chrysalis/readsToComponents.out > /scratchsan/kai/rnaseq/trinity_out_dir/read_partitions/Fb_0/CBin_0/c9162.trinity.reads.fa.out/chrysalis/readsToComponents.out.sort, checkpoint exists.
                    -skipping cmd: /homes/biertank/kai/software/trinityrnaseq-2.0.2/Inchworm/bin//FastaToDeBruijn --fasta /scratchsan/kai/rnaseq/trinity_out_dir/read_partitions/Fb_0/CBin_0/c9162.trinity.reads.fa.out/chrysalis/bundled_iworm_contigs.fasta -K 24 --graph_per_record --threads 1 > /scratchsan/kai/rnaseq/trinity_out_dir/read_partitions/Fb_0/CBin_0/c9162.trinity.reads.fa.out/chrysalis/bundled_iworm_contigs.fasta.deBruijn, checkpoint exists.
                    Butterfly_cmds: /scratchsan/kai/rnaseq/trinity_out_dir/read_partitions/Fb_0/CBin_0/c9162.trinity.reads.fa.out/chrysalis/butterfly_commands
                    Inchworm and Chrysalis complete. Butterfly commands to execute are provided here:
                    /scratchsan/kai/rnaseq/trinity_out_dir/read_partitions/Fb_0/CBin_0/c9162.trinity.reads.fa.out/chrysalis/butterfly_commands

                    ---------------------------------------------------------------
                    -------------------- Butterfly --------------------------------
                    -- (Reconstruct transcripts from reads and de Bruijn graphs) --
                    ---------------------------------------------------------------

                    Thursday, February 12, 2015: 15:55:24 CMD: /homes/biertank/kai/software/trinityrnaseq-2.0.2/trinity-plugins/parafly/bin/ParaFly -c /scratchsan/kai/rnaseq/trinity_out_dir/read_partitions/Fb_0/CBin_0/c9162.trinity.reads.fa.out/chrysalis/butterfly_commands -shuffle -CPU 1 -failed_cmds failed_butterfly_commands.68408.txt -v
                    Number of Commands: 1
                    Exception in thread "main" java.lang.RuntimeException: Error, detected cycles in seqvertex_graph, so not a DAG as expected!
                    at TransAssembly_allProbPaths.zipper_collapse_DAG_zip_up(TransAssembly_allProbPaths.java:2098)
                    at TransAssembly_allProbPaths.convert_path_DAG_to_SeqVertex_DAG(TransAssembly_allProbPaths.java:1980)
                    at TransAssembly_allProbPaths.create_DAG_from_OverlapLayout(TransAssembly_allProbPaths.java:1550)
                    at TransAssembly_allProbPaths.main(TransAssembly_allProbPaths.java:918)
                    succeeded(0), failed(1) 100% completed.

                    We are sorry, commands in file: [failed_butterfly_commands.68408.txt] failed. :-(

                    Error, cmd: /homes/biertank/kai/software/trinityrnaseq-2.0.2/trinity-plugins/parafly/bin/ParaFly -c /scratchsan/kai/rnaseq/trinity_out_dir/read_partitions/Fb_0/CBin_0/c9162.trinity.reads.fa.out/chrysalis/butterfly_commands -shuffle -CPU 1 -failed_cmds failed_butterfly_commands.68408.txt -v died with ret 256 at /homes/biertank/kai/software/trinityrnaseq-2.0.2/util/support_scripts/../../Trinity line 1997.

                    Trinity run failed. Must investigate error above. at /homes/biertank/kai/software/trinityrnaseq-2.0.2/util/support_scripts/../../Trinity line 1057.
                    succeeded(0), failed(1) 100% completed.

                    We are sorry, commands in file: [FailedCommands] failed. :-(

                    If I execute in the trinity_out_dir:
                    /homes/biertank/kai/software/trinityrnaseq-2.0.2/util/support_scripts/../../Trinity --single "/scratchsan/kai/rnaseq/trinity_out_dir/read_partitions/Fb_0/CBin_0/c9162.trinity.reads.fa" --output "/scratchsan/kai/rnaseq/trinity_out_dir/read_partitions/Fb_0/CBin_0/c9162.trinity.reads.fa.out" --CPU 1 --max_memory 16G --full_cleanup --seqType fa --trinity_complete --jaccard_clip

                    This is the last screen output:

                    ---------------------------------------------------------------
                    -------------------- Butterfly --------------------------------
                    -- (Reconstruct transcripts from reads and de Bruijn graphs) --
                    ---------------------------------------------------------------

                    Thursday, February 12, 2015: 16:15:37 CMD: /homes/biertank/kai/software/trinityrnaseq-2.0.2/trinity-plugins/parafly/bin/ParaFly -c /scratchsan/kai/rnaseq/trinity_out_dir/read_partitions/Fb_0/CBin_0/c9162.trinity.reads.fa.out/chrysalis/butterfly_commands -shuffle -CPU 1 -failed_cmds failed_butterfly_commands.72866.txt -v
                    Number of Commands: 1
                    succeeded(1) 100% completed.

                    All commands completed successfully. :-)

                    CMD finished (5 seconds)
                    Thursday, February 12, 2015: 16:15:42 CMD: /homes/biertank/kai/software/trinityrnaseq-2.0.2/util/support_scripts/print_butterfly_assemblies.pl /scratchsan/kai/rnaseq/trinity_out_dir/read_partitions/Fb_0/CBin_0/c9162.trinity.reads.fa.out/chrysalis/component_base_listing.txt > Trinity.fasta.tmp
                    CMD finished (0 seconds)
                    Fully cleaning up.


                    ###################################################################
                    Butterfly assemblies are written to /scratchsan/kai/rnaseq/trinity_out_dir/read_partitions/Fb_0/CBin_0/c9162.trinity.reads.fa.out.Trinity.fasta
                    ###################################################################


                    WARNING, cannot remove output directory /scratchsan/kai/rnaseq/trinity_out_dir/read_partitions/Fb_0/CBin_0/c9162.trinity.reads.fa.out, since not created in this run. (safety precaution)
                    Even if I don't really know what's happening this looks better, right? Since it has only stopped because it could not remove the directory from the previous run.

                    Does that mean increasing the initial memory could help? To what number (I cannot use all the memory on that machine....)?

                    Comment


                    • #11
                      It seems the previously failed command succeded with some more memory. Now what happens if you execute the complete Trinity command again?

                      Comment


                      • #12
                        Do you mean restarting Trinity from the same location with the previously calculated data still in there?

                        again with:

                        nohup Trinity --seqType fq --max_memory 80G --left H2_rnaseq_R1_trimmed.fastq.gz --right H2_rnaseq_R2_trimmed.fastq.gz --CPU 10 --jaccard_clip &

                        But with maybe 200G RAM?

                        Can you just restart Trinity after a failed run and use the old data?

                        Comment


                        • #13
                          Yes, it should pick up where it failed. I don't think the max_memory setting matters anymore, but go ahead and change it.

                          Comment


                          • #14
                            Seems like this was just the last step. After restarting I got the Trinity.fasta.

                            Thanks!

                            Comment


                            • #15
                              To finalize the whole story, the latest Trinity release (2.0.3) seems to have fixed this problem and the trinity run is also much faster.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              30 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              32 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              28 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              52 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X