Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with Resuming TopHat pipeline with unmapped reads

    Hi, i have some problems i don't seem to be able to solve. When i run tophat on my own machine, i dont have any problems, but when i run both a binary or a version compiled on the server on our cluster, i get the following error.
    I have tried several versions of Tophat without luck.

    So the big qustions: Do anyone know which directory it lacks?

    The run.log is not of much help

    [2013-08-06 11:52:08] Building transcriptome data files..
    [2013-08-06 11:52:21] Building Bowtie index from btindex.fa
    [2013-08-06 12:01:30] Mapping left_kept_reads to transcriptome btindex with Bowtie2
    [2013-08-06 12:05:17] Resuming TopHat pipeline with unmapped reads
    Traceback (most recent call last):
    File "/package/bio/tophat/src/tophat", line 4072, in ?
    sys.exit(main())
    File "/package/bio/tophat/src/tophat", line 4038, in main
    user_supplied_deletions)
    File "/package/bio/tophat/src/tophat", line 3446, in spliced_alignment
    if not nonzeroFile(initial_reads[0]) and \
    File "/package/bio/tophat/src/tophat", line 1155, in nonzeroFile
    samtools_view = subprocess.Popen(samtools_view_cmd, stdout=subprocess.PIPE)
    File "/usr/lib64/python2.4/subprocess.py", line 550, in __init__
    errread, errwrite)
    File "/usr/lib64/python2.4/subprocess.py", line 996, in _execute_child
    raise child_exception
    OSError: [Errno 2] No such file or directory
    ~


    I i am running TopHat (v2.0.9), Bowtie 2.1.0.0 and Samtools 0.1.18.0
    Bjørn Øst

  • #2
    Since you refer to getting the error on the cluster have you checked to make sure that the filesystem your files reside on is available on the relevant cluster nodes?

    You could ssh into the cluster node(s) your job is failing on and look to see if the filesystem is mounted there (or available through autofs depending on how the cluster admins have set things up). Sometimes filesystems that are mounted on the head node may not be accessible on the worker nodes.

    Comment


    • #3
      i can access all the directories with full rights when i ssh to my cluster, and i build everything it while i was on the cluster. That is why i dont know why it doesn't work
      Bjørn Øst

      Comment


      • #4
        Is it possible to see in any of the logfiles what directory is making trouble?
        Bjørn Øst

        Comment


        • #5
          Originally posted by bjoernoest View Post
          i can access all the directories with full rights when i ssh to my cluster, and i build everything it while i was on the cluster. That is why i dont know why it doesn't work
          When you refer to "ssh to my cluster" I assume that is referencing the head node (or log-in node). Before we go any further let us verify that this a real compute cluster with a job scheduling system (e.g. LSF or SGE).

          If that is true then start your job again see what exact node(s) it is running on. SSH into one of those node(this can be done from the log-in node) and see if the file system your files reside on is visible/available on that node.

          Comment


          • #6
            We use Torque Portable Batch System (PBS) for job scheduling, so when i submit my job using qsub i get a xxx.mycluster.adress... so i ssh to this, and everything seem fine.
            Bjørn Øst

            Comment


            • #7
              Have you looked to see if there is a "logs/run.log" file in the original output directory you had specified? That should have additional information available.

              Comment


              • #8
                It's complaining that samtools isn't in the available $PATH on whatever node this is being run on (you should get a samtools error if it can't find the files specified). There are a few possible reasons why this could occur, most obviously that the correct $PATH isn't being set (or set correctly) or that that mount point isn't actually mounted on the affected node (which seems to happen frequently on some clusters). You might check the documentation for your cluster (or just bug the admin) to try to figure out which of the possibilities is correct. In the later case, the whomever admins the cluster will have to fix the issue.

                Comment


                • #9
                  Hmm when i ran it again it crashed again, but now here in the run.log

                  /package/bio/tophat/src/bam2fastx --all --fastq test/tmp/left_kept_reads.bam|/package/bio/tophat/src/bowtie2-align -q -k 60 -D 15 -R 2 -N 0 -L 20 -i S,1,1.25 --gbar 4 --mp 6,2 --np 1 --rdg 5,3 --rfg 5,3 --score-min C,-14,0 -p 2 --sam-no-hd -x test/tmp/btindex -|/package/bio/tophat/src/fix_map_ordering --bowtie2-min-score 15 --read-mismatches 2 --read-gap-length 2 --read-edit-dist 2 --read-realign-edit-dist 3 --sam-header test/tmp/btindex.bwt.samheader.sam - - test/tmp/left_kept_reads.m2g_um.bam | /package/bio/tophat/src/map2gtf --sam-header test/tmp/btindex_genome.bwt.samheader.sam test/tmp/btindex.fa.tlst - test/tmp/left_kept_reads.m2g.bam > test/logs/m2g_left_kept_reads.out

                  all the files exist
                  Bjørn Øst

                  Comment


                  • #10
                    That is just the last command that tophat saved when the crash occurred. That is done for later use if you wanted to resume the run with the special tophat option -R (resume).

                    Following may cause you some grief from the cluster admins but what happens if you try to run the above command outside PBS on command line (be ready to kill the job in case it overwhelms the headnode).

                    Is there comparable amount of RAM available on cluster nodes as compared to your personal machine?

                    Comment


                    • #11
                      sorry for the long answer, i have been sick.
                      That does not work either, is there a way to see which command it tries to execute?
                      Bjørn Øst

                      Comment


                      • #12
                        Can you be more specific as to what does not work? Trying to "resume" the TopHat job or trying to run the command line saved in "run.log"file outside PBS?

                        I am not familiar with PBS but there is a way to capture the standard out and error output into files (-o and -e options). Have you tried that?

                        Comment


                        • #13
                          Sorry, i tried to run it outside PBS, and it still crashes at resuming job.
                          Yes, they does not provide much information, the error file just contains the error i posted above.
                          I have tested the commands from a run on my local machine, and it seems that i finishes the #>map_start, but it lacks something in #>map_segments:
                          /pgzip -cd< test/tmp/left_kept_reads.m2g_um_seg1.fq.z|/package/bio/tophat/src/bowtie2-align -q -k 41 -N 1 -L 20 -p 8 --sam-no-hd -x genome/btindex -|/package/bio/tophat/src/fix_map_ordering --bowtie2-min-score 10 --read-mismatches 2 --read-gap-length 2 --read-edit-dist 2 --read-realign-edit-dist 3 --index-outfile test/tmp/left_kept_reads.m2g_um_seg1.bam.index --sam-header test/tmp/btindex_genome.bwt.samheader.sam - test/tmp/left_kept_reads.m2g_um_seg1.bam test/tmp/left_kept_reads.m2g_um_seg1_unmapped.bam

                          tmp/left_kept_reads.m2g_um_seg1.fq.z: No such file or directory.

                          But i cannot seem to find where that one is generated, these are the only left_kept_reads i have in my tmp.
                          left_kept_reads.m2g_um_seg1
                          left_kept_reads.m2g_um_seg1.bam
                          left_kept_reads.m2g_um_seg1.bam.index
                          left_kept_reads.m2g_um_seg1_unmapped.bam
                          left_kept_reads.m2g_um_seg1_unmapped.bam.index
                          left_kept_reads.m2g_um_seg1_unmapped.bam:q
                          left_kept_reads.m2g_um_seg1_unmapped.bam:q.index
                          .
                          Bjørn Øst

                          Comment


                          • #14
                            Is the "/tmp" on the local file system on the cluster node? I wonder if that is filling up as the job progresses.

                            One way to check is to re-run the original TopHat job (with all parameters) and watch the /tmp usage on the node where it is running.

                            Another thing to check is if you are bumping up against any "limits" set by the sys admins on your account. (check by running "limit" or "ulimit").

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Current Approaches to Protein Sequencing
                              by seqadmin


                              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                              04-04-2024, 04:25 PM
                            • seqadmin
                              Strategies for Sequencing Challenging Samples
                              by seqadmin


                              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                              03-22-2024, 06:39 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, 04-11-2024, 12:08 PM
                            0 responses
                            25 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-10-2024, 10:19 PM
                            0 responses
                            27 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-10-2024, 09:21 AM
                            0 responses
                            24 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-04-2024, 09:00 AM
                            0 responses
                            52 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X