Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • TopHat error: disk full

    Hi All
    I am analyzing one PE lane with read files 's_3_1_sequence.txt’ and 's_3_2_sequence.txt’;
    here are the first lines the read files:

    s_3_1_sequence.txt
    @GAII:3:1:2:321#0/1
    GGGGCCTGGGACTCTNGGTCCCCTACTGNAGACA
    +GAII:3:1:2:321#0/1
    `[`aaX`_aV`aaaZDTKT\X__^XGZZDVV``a
    @GAII:3:1:2:314#0/1
    CCACCAGGCGCCCGTNGTGGCGCAGGAANGGGTG
    +GAII:3:1:2:314#0/1
    _``aa_\\_\_aa_PDZVYZ\ZZPZ\TVDHZT\Z
    @GAII:3:1:2:508#0/1
    GTTCAGCAGGAATGCNGAGATCGGAAGANGGGTT

    s_3_2_sequence.txt
    @GAII:3:1:2:321#0/2
    TCCCNCCTGCCCNNNGCTTCNNNGTTTTNNNTCA
    +GAII:3:1:2:321#0/2
    BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
    @GAII:3:1:2:314#0/2
    CAGTNCCAGCGCNNNAGCGTNNNGACCTNNNACC
    +GAII:3:1:2:314#0/2
    `_JJDZ_aBBBBBBBBBBBBBBBBBBBBBBBBBB
    @GAII:3:1:2:508#0/2
    TCATNCCTGCTTANNCTATANNNTAAGAGNNTCT
    M1-80330:reads jdhahbi$

    the command-line I used:
    tophat -r 200 /mydir/bowtie-0.9.9.3/indexes/h_sapiens_asm s_3_1_sequence.txt s_3_2_sequence.txt

    the output with the error is below; I checked the disk space and there are more than 100 GB available:

    [Thu May 21 16:53:57 2009] Beginning TopHat run (v1.0.7)
    -----------------------------------------------
    [Thu May 21 16:53:57 2009] Preparing output location ./tophat_out/
    [Thu May 21 16:53:57 2009] Checking for Bowtie index files
    [Thu May 21 16:53:57 2009] Checking for reference FASTA file
    [Thu May 21 16:53:57 2009] Checking for Bowtie
    Bowtie version: 0.9.9.3
    [Thu May 21 16:53:58 2009] Checking reads
    seed length: 34bp
    format: fastq
    quality scale: phred
    Splitting reads into 1 segments
    [Thu May 21 17:00:49 2009] Mapping reads against h_sapiens_asm with Bowtie
    Splitting reads into 1 segments
    [Thu May 21 18:03:09 2009] Mapping reads against h_sapiens_asm with Bowtie
    [Thu May 21 18:51:52 2009] Searching for junctions via coverage islands
    [Thu May 21 18:59:12 2009] Searching for junctions via mate-pair closures
    [Fri May 22 05:40:00 2009] Retrieving sequences for splices
    [Fri May 22 05:48:53 2009] Indexing splices
    Index is corrupt: File size for ./tophat_out/tmp/segment_juncs.1.ebwt should have been 3799224901 but is actually -495742395.
    Please check if there is a problem with the disk or if disk is full.
    [FAILED]
    Error: Splice sequence indexing failed

    Any suggestions are appreciated,
    Thanks,

    joseph

  • #2
    Hi Joseph,

    This is most likely the same bug as a few other users have reported, where with short, paired reads, it's possible for the splice index to become unreasonably large and that may be tripping Bowtie's index integrity checks. I have fixed this in my source tree, and the new version should be released next week. I am just tying up a few loose ends with the latest build.

    Sorry for the inconvenience. If you'd like to test out a snapshot of the code to see if it resolves the problem for you, please email me directly.

    Comment


    • #3
      Hi Cole,

      I had a disk quote problem too. TopHat produced: >1.5TB!

      I use v1.1.4 with the default setting to map ~1M SE total RNA reads which are 20-37 nt long. The huge file is produced by long_spanning_reads after junction mapping step.

      Any ideas?
      Thanks,

      Biter

      Comment


      • #4
        This is a bug due to variable read length, which we fixed, the next version will include the fix.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        18 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        22 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        16 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        47 views
        0 likes
        Last Post seqadmin  
        Working...
        X