Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tophat error: could not get read# 3868130 from stream!

    Hi all,

    I am using Tophat to align colorspace RNA-seq to genome.fa.

    tophat -p 8 --color -o tophat_G1 --quals Pdom-preliminary-genome.index filt
    ered_G1_U_F3.csfasta filtered_G1_U_F3_QV.qual

    It ran into an error and prompted the following information.

    Thanks in advance.

    Ruolin


    [2012-06-29 10:26:22] Beginning TopHat run (v2.0.1)
    -----------------------------------------------
    [2012-06-29 10:26:22] Checking for Bowtie
    Bowtie version: 0.12.7.0
    [2012-06-29 10:26:22] Checking for Samtools
    Samtools version: 0.1.18.0
    [2012-06-29 10:26:22] Checking for Bowtie index files
    [2012-06-29 10:26:22] Checking for reference FASTA file
    [2012-06-29 10:26:22] Generating SAM header for Pdom-preliminary-genome.index
    format: fasta
    [2012-06-29 10:26:25] Preparing reads
    left reads: min. length=75, max. length=75, 7135633 kept reads (423313
    discarded)
    [2012-06-29 10:32:45] Mapping left_kept_reads to genome Pdom-preliminary-genome.
    index with Bowtie
    [2012-06-29 10:47:31] Mapping left_kept_reads_seg1 to genome Pdom-preliminary-ge
    nome.index with Bowtie (1/3)
    [2012-06-29 10:54:16] Mapping left_kept_reads_seg2 to genome Pdom-preliminary-ge
    nome.index with Bowtie (2/3)
    [2012-06-29 11:00:45] Mapping left_kept_reads_seg3 to genome Pdom-preliminary-ge
    nome.index with Bowtie (3/3)
    [2012-06-29 11:07:19] Searching for junctions via segment mapping
    [FAILED]
    Error: segment-based junction search failed with err =1
    Error: could not get read# 3868130 from stream!

  • #2
    same problem

    Hi,

    I know this isnt very helpful but I am having the same problem. I am using color space reads and v2.0.4.

    [2012-07-13 19:31:25] Searching for junctions via segment mapping
    [FAILED]
    Error: segment-based junction search failed with err =1
    Error: could not get read# 57335084 from stream!

    Can any one help?

    Comment


    • #3
      Threading seems to be the problem

      Hi,
      I had/have the exact same problem for quite some time with tophat2. After an intensive search for the problem I have come to the conclusion that it is probably a threading problem. You can 'solve' this by setting the number of threads to 1 (option -p). Removing the -p option completely should also work since 1 is the default. Logically this dramatically increases the run time. So hopefully there will be soon a new version (current version 2.0.4) where this threading problem is fixed...

      (Btw I did not encounter this threading problem with tophat1.4)

      Comment


      • #4
        I am having the same problem with color space reads and TopHat v 2.0.4. I have tried removing the -p option and letting the job run on one processor - but still getting the same error. Any other ideas?

        Comment


        • #5
          Other possible causes

          Hi scor,
          There are other threads about this issue that point towards memory and permission issues. So it could be usefull to check you're computer/cluster for memory usage, and available harddrive space during the tophat run.

          Reg.

          Comment


          • #6
            Hi all,

            This problem with colorspace exists also in tophat 2.0.6 and has taken really a lot of my time... Unfortunately, switching to single core processing is not a solution for large datasets... For my case, and after playing a lot to see which parameters are causing the problem, the only way to maintain multi-threading is to switch off the --no-coverage-search option which also takes ages to complete with large datasets...

            Panos

            Comment


            • #7
              And just verified it in tophat 2.0.7... Does anyone have a possible solution?

              Comment


              • #8
                Hello,
                Got the same issue (with paired-end F5 35bp, F3 75bp). I realized that the issue was related to F3 (the long fragment) as tophat was able to map F5 reads. I finally managed to make tophat work by trimming F3 reads down to 69. Perhaps, an internal magic number...

                Hope it will help....

                Comment


                • #9
                  A 'maybe' Solution

                  I've solved this problem for now.

                  Well, I have the same problem. I am dealing with ~170M colorspace reads.

                  My Tophat Version:
                  TopHat v2.0.8b

                  My ERROR:
                  [2013-05-21 10:55:08] Searching for junctions via segment mapping
                  [FAILED]
                  Error: segment-based junction search failed with err =1
                  Error: could not get read# 123430531 from stream!

                  My Solution:
                  NOTE: This solution works without any error only if you have a pre-built transcriptomic index. See this link on how to build your transcriptomic index <link>

                  The apparent reason that I've read online for this problem seems to be with the the number of threads being greater than 1 [example: -p 20 in the tophat execution options]. Starting tophat again with -p 1 will be a tidious and a time consuming process. So, the ideal solution would be to 'Resume' the process from the last successful checkpoint. Fortunately Tophat provides an option for this <resume description>.


                  I've used this description to tweek the run.log file by replacing the -p #number to -p 1 i.e, changing your initial number of threads to 1. This tweek resumes from the last successful checkpoint with 1 thread.
                  Note 1: Upon the first resuming, you will have the file run.resume0.log. If incase your resuming ends up with no success, edit both the run.log and run.resume0.log file with -p 1 (editing just the first command is sufficient, the subsequent commands are built based on that initial command).

                  Note 2: It is always good to back up the run.log and run.resume0.log files before you do such tweekings; else you might end up screwing up the whole thing.
                  But, this tweek is working fairly well enough for me with no problems.

                  Although this tweek resumes from the last successful checkpoint with 1 thread, the subsequent processes down the pipeline will also run with 1 thread. In order to solve for this, follow these steps:
                  This is the error-prone step:
                  [2013-05-21 15:35:22] Searching for junctions via segment mapping

                  So, use the above tweek and this process will be completed and you will enter this step:
                  [2013-05-21 11:42:50] Retrieving sequences for splices

                  Now break the process with Ctrl + c
                  Re-edit the run.log file with -p 20 (I chose 20 because I have 24 cores in my CPU)
                  And resume the process again with tophat -R output_Dir


                  You can give this tweek a shot to see if its working for you.

                  P.S: This is a silly point, such things should be fixed in the tophat code itself.
                  Last edited by mallela; 05-22-2013, 07:52 AM. Reason: updating the information

                  Comment


                  • #10
                    As you can see in my previous post, I solve this problem just by "unsetting" -p parameter.

                    Comment


                    • #11
                      Very nice Mattia My post above is also about "unsetting" -p parameter.

                      Your solution unsets the -p parameter for all the steps of TopHat right from the beggining, thereby proving no significance for the creation of multi-threading in TopHat (default -p is 1).

                      On contrast to yours, my tweek helps to selectively un-set the -p parameter at the error location (i.e at the [2013-05-21 15:35:22] Searching for junctions via segment mapping ) & also preserving the functionality of multi-threading, thereby aiding for faster runs.

                      After the above step is successfully done, i.e, at
                      [2013-05-21 11:42:50] Retrieving sequences for splices
                      I will break the process with ctrl+c ; and re-edit the run.log file with -p 20 and resume the process again. (In short, injecting back the multi-threading option that is removed for the previous error prone step.)

                      Comment


                      • #12
                        its a threading problem remove -p option

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Advancing Precision Medicine for Rare Diseases in Children
                          by seqadmin




                          Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                          12-16-2024, 07:57 AM
                        • seqadmin
                          Recent Advances in Sequencing Technologies
                          by seqadmin



                          Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                          Long-Read Sequencing
                          Long-read sequencing has seen remarkable advancements,...
                          12-02-2024, 01:49 PM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 12-17-2024, 10:28 AM
                        0 responses
                        27 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 12-13-2024, 08:24 AM
                        0 responses
                        43 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 12-12-2024, 07:41 AM
                        0 responses
                        29 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 12-11-2024, 07:45 AM
                        0 responses
                        42 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X