Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tophat error -segment-based junction search failled with err=1

    Hi

    Lately, I have a problem in running tophat 1.3.1 with a 100bp paired-end Illumina HiSeq RNA reads. After cleaning (quality trim, duplicate removal, adapter removal) I did split the files (taking care not to split the last entry sequence and quality scores) and fed to tophat . Please note, here I have more of left-kept reads because I have an extra file with leftover unpaired reads. Also, I have noticed with previous successful runs, eventhough the fed fastq paired read files have the same number of sequences what we see (in the log) as left-reads and right reads are slightly different.

    Here is the log:

    [Thu Oct 27 18:33:40 2011] Beginning TopHat run (v1.3.1)
    -----------------------------------------------
    [Thu Oct 27 18:33:40 2011] Preparing output location ./tophat_out/
    [Thu Oct 27 18:33:40 2011] Checking for Bowtie index files
    [Thu Oct 27 18:33:40 2011] Checking for reference FASTA file
    [Thu Oct 27 18:33:40 2011] Checking for Bowtie
    Bowtie version: 0.12.7.0
    [Thu Oct 27 18:33:40 2011] Checking for Samtools
    Samtools Version: 0.1.12a
    [Thu Oct 27 18:33:40 2011] Generating SAM header for ../PG210SC5
    [Thu Oct 27 18:33:40 2011] Preparing reads
    format: fastq
    quality scale: phred33 (default)
    Left reads: min. length=50, count=134790672
    Right reads: min. length=50, count=118121205
    [Thu Oct 27 20:34:22 2011] Mapping left_kept_reads against PG210SC5 with Bowtie
    [Thu Oct 27 21:42:15 2011] Processing bowtie hits
    [Thu Oct 27 23:08:30 2011] Mapping left_kept_reads_seg1 against PG210SC5 with Bowtie (1/4)
    [Fri Oct 28 00:27:19 2011] Mapping left_kept_reads_seg2 against PG210SC5 with Bowtie (2/4)
    [Fri Oct 28 01:47:04 2011] Mapping left_kept_reads_seg3 against PG210SC5 with Bowtie (3/4)
    [Fri Oct 28 02:57:47 2011] Mapping left_kept_reads_seg4 against PG210SC5 with Bowtie (4/4)
    [Fri Oct 28 04:25:49 2011] Mapping right_kept_reads against PG210SC5 with Bowtie
    [Fri Oct 28 05:26:52 2011] Processing bowtie hits
    [Fri Oct 28 06:48:08 2011] Mapping right_kept_reads_seg1 against PG210SC5 with Bowtie (1/4)
    [Fri Oct 28 08:00:12 2011] Mapping right_kept_reads_seg2 against PG210SC5 with Bowtie (2/4)
    [Fri Oct 28 09:11:43 2011] Mapping right_kept_reads_seg3 against PG210SC5 with Bowtie (3/4)
    [Fri Oct 28 10:21:22 2011] Mapping right_kept_reads_seg4 against PG210SC5 with Bowtie (4/4)
    [Fri Oct 28 11:56:21 2011] Searching for junctions via segment mapping
    [FAILED]
    Error: segment-based junction search failed with err =1

    ____________________________________________________________________________________________

    In the segment_juncs.log the last entry reads:

    FZStream::rewind() popen(gzip -cd './tophat_out/tmp/left_kept_reads_seg1_missing.fq.z') failed


    I have previously used such mixture of paired and unpaired reads successfully (I think!) with another set of reads. However, they were smaller read sets. Even with the above when I use only one pair out of four split files it works fine.

    Appreciate if anyone can help me to resolve this problem.

  • #2
    I have the same problem. Were you able to figure out the reason for this error?

    -canbruce

    Comment


    • #3
      Not yet. I suspect tophat is running out of memory. Although I am running it on a 48GB RAM Linux machine (Ubuntu OS) I think it is still not enough to handle such large inputs.

      Comment


      • #4
        I had the same problem today, hope someone can stand out and point the way to fix.

        My data directly output from illumina pipeline with two fastq files.
        Originally posted by upadhyayanm View Post
        Hi

        Lately, I have a problem in running tophat 1.3.1 with a 100bp paired-end Illumina HiSeq RNA reads. After cleaning (quality trim, duplicate removal, adapter removal) I did split the files (taking care not to split the last entry sequence and quality scores) and fed to tophat . Please note, here I have more of left-kept reads because I have an extra file with leftover unpaired reads. Also, I have noticed with previous successful runs, eventhough the fed fastq paired read files have the same number of sequences what we see (in the log) as left-reads and right reads are slightly different.

        Here is the log:

        [Thu Oct 27 18:33:40 2011] Beginning TopHat run (v1.3.1)
        -----------------------------------------------
        [Thu Oct 27 18:33:40 2011] Preparing output location ./tophat_out/
        [Thu Oct 27 18:33:40 2011] Checking for Bowtie index files
        [Thu Oct 27 18:33:40 2011] Checking for reference FASTA file
        [Thu Oct 27 18:33:40 2011] Checking for Bowtie
        Bowtie version: 0.12.7.0
        [Thu Oct 27 18:33:40 2011] Checking for Samtools
        Samtools Version: 0.1.12a
        [Thu Oct 27 18:33:40 2011] Generating SAM header for ../PG210SC5
        [Thu Oct 27 18:33:40 2011] Preparing reads
        format: fastq
        quality scale: phred33 (default)
        Left reads: min. length=50, count=134790672
        Right reads: min. length=50, count=118121205
        [Thu Oct 27 20:34:22 2011] Mapping left_kept_reads against PG210SC5 with Bowtie
        [Thu Oct 27 21:42:15 2011] Processing bowtie hits
        [Thu Oct 27 23:08:30 2011] Mapping left_kept_reads_seg1 against PG210SC5 with Bowtie (1/4)
        [Fri Oct 28 00:27:19 2011] Mapping left_kept_reads_seg2 against PG210SC5 with Bowtie (2/4)
        [Fri Oct 28 01:47:04 2011] Mapping left_kept_reads_seg3 against PG210SC5 with Bowtie (3/4)
        [Fri Oct 28 02:57:47 2011] Mapping left_kept_reads_seg4 against PG210SC5 with Bowtie (4/4)
        [Fri Oct 28 04:25:49 2011] Mapping right_kept_reads against PG210SC5 with Bowtie
        [Fri Oct 28 05:26:52 2011] Processing bowtie hits
        [Fri Oct 28 06:48:08 2011] Mapping right_kept_reads_seg1 against PG210SC5 with Bowtie (1/4)
        [Fri Oct 28 08:00:12 2011] Mapping right_kept_reads_seg2 against PG210SC5 with Bowtie (2/4)
        [Fri Oct 28 09:11:43 2011] Mapping right_kept_reads_seg3 against PG210SC5 with Bowtie (3/4)
        [Fri Oct 28 10:21:22 2011] Mapping right_kept_reads_seg4 against PG210SC5 with Bowtie (4/4)
        [Fri Oct 28 11:56:21 2011] Searching for junctions via segment mapping
        [FAILED]
        Error: segment-based junction search failed with err =1

        ____________________________________________________________________________________________

        In the segment_juncs.log the last entry reads:

        FZStream::rewind() popen(gzip -cd './tophat_out/tmp/left_kept_reads_seg1_missing.fq.z') failed


        I have previously used such mixture of paired and unpaired reads successfully (I think!) with another set of reads. However, they were smaller read sets. Even with the above when I use only one pair out of four split files it works fine.

        Appreciate if anyone can help me to resolve this problem.

        Comment


        • #5
          Hi twonway, I am just wondering what is the amount of your data. How many reads you fed to Tophat?


          Originally posted by townway View Post
          I had the same problem today, hope someone can stand out and point the way to fix.

          My data directly output from illumina pipeline with two fastq files.
          Xi Wang

          Comment


          • #6
            My data is around 200M reads from Hiseq one lane and I used 16 G memory to run Tophat 1.3.3 with coverage microexon butterfly search option.
            Btw It worked well with old version of tophat

            Originally posted by Xi Wang View Post
            Hi twonway, I am just wondering what is the amount of your data. How many reads you fed to Tophat?

            Comment


            • #7
              The butterfly search option uses a lot of memory. I'm pretty sure you'll need a lot more than 16GB memory to align 200M reads using that option. I have 16GB and ran out of memory trying to align ~30M 100bp PE reads with the butterfly option.

              Comment


              • #8
                yes that is true, without them, it works well now.

                Originally posted by biznatch View Post
                The butterfly search option uses a lot of memory. I'm pretty sure you'll need a lot more than 16GB memory to align 200M reads using that option. I have 16GB and ran out of memory trying to align ~30M 100bp PE reads with the butterfly option.

                Comment


                • #9
                  Tophat has updated to version 1.4.0 (BETA). Has anyone already tried this new version? As a big change in this new version, I think the strategy that Tophat maps reads to the transcriptome given by users first would be much stabler.
                  Xi Wang

                  Comment


                  • #10
                    ReadStream::getRead() called with out-of-order id#!

                    I get a similar error. However, I get a different indication (see title).
                    After looking at the code, I think the error has to do with threading on multiple cores and Read_ids. In the section of the code I looked at, read_ids are handled distinctly for threaded and non-threaded code (I think). Am running latest (2.0.3). Am trying again without threading.
                    barry

                    Comment


                    • #11
                      Kesner, have you solved the problem by not using threading. I have the same problems for segment_juncs
                      Processed 4000000 root segment groupssi
                      Error: ReadStream::getRead() called with out-of-order id#!

                      I'm using Tophat 1.4.1 (I have the same error for 2.0.3, but it's from tophat_reports). And it should not be a memory problem because I have 96G RAM. Therefore maybe something related to threading.
                      Originally posted by kesner View Post
                      I get a similar error. However, I get a different indication (see title).
                      After looking at the code, I think the error has to do with threading on multiple cores and Read_ids. In the section of the code I looked at, read_ids are handled distinctly for threaded and non-threaded code (I think). Am running latest (2.0.3). Am trying again without threading.

                      Comment


                      • #12
                        re: problem fixed?

                        I think I get passed the problem by using single treading. Since there are many process on the machine I am using, it is possible some other resource failure was to blame.

                        Now my problem is that it is taking forever for the run to complete. Alignments are finished but the code does about 1 chr a day to process junctions. On the other hand, I'm not sure throwing multiple cores at this step does anything. I know my reads are contaminated with a lot of background. I figure that this is why I am having problems with the whole process in general.
                        barry

                        Comment


                        • #13
                          I agreed that there should be something wrong with the resource allocation. I re-run some samples (also multi-threading), sometimes it got the same error message, sometimes I can finish it successfully. There this problem is not repeatable, and maybe very related the computer situation at running time.



                          Originally posted by kesner View Post
                          I think I get passed the problem by using single treading. Since there are many process on the machine I am using, it is possible some other resource failure was to blame.

                          Now my problem is that it is taking forever for the run to complete. Alignments are finished but the code does about 1 chr a day to process junctions. On the other hand, I'm not sure throwing multiple cores at this step does anything. I know my reads are contaminated with a lot of background. I figure that this is why I am having problems with the whole process in general.

                          Comment


                          • #14
                            Does latest tophat version solve problem?

                            I was wondering If you still see the problem with the latest code build of tophat2?
                            barry

                            Comment


                            • #15
                              I am getting the same error with tophat 2.0.0.

                              tophat.log:
                              Code:
                              ....
                              [2012-06-30 11:50:15] Mapping right_kept_reads.m2g_um_seg4 against mm9.fa with Bowtie2 (4/4)
                              /usr/local/bin/tophat-2.0.0/fix_map_ordering: /lib64/libz.so.1: no version information available (required by /usr/local/bin/tophat-2.0.0/fix_map_ordering)
                              [2012-07-01 00:20:11] Searching for junctions via segment mapping
                                      [FAILED]
                              Error: segment-based junction search failed with err =1
                              Error: ReadStream::getRead() called with out-of-order id#!
                              segment_juncs.log:
                              Code:
                              ...
                                      Loading chrUn_random...done
                                      Loading chrX_random...done
                                      Loading chrY_random...done
                                      Loading ...done
                              >> Performing segment-search:
                              Loading left segment hits...
                              Error: ReadStream::getRead() called with out-of-order id#!
                              Has anyone uncovered anything recently? At U Texas, they report that single threading allowed proper execution. Does anyone know how to be able to "continue" the tophat procedure and restart from the segment-based junction search?? I'm going to try hacking the python script, but i hope someone has done it before. I have a dozen or so samples that have aligned for about a week. Don't want to redo the alignments, especially with only one core (OUCH!!)

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-27-2024, 06:37 PM
                              0 responses
                              12 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-27-2024, 06:07 PM
                              0 responses
                              11 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              53 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              69 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X