Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • TopHat fails to catch error thrown by Bowtie, gives incomplete results

    Hi,

    I've recently discovered a strange behaviour in TopHat: it can sometimes give incomplete (or even incorrect) results, due to an error while running Bowtie on the junction sequence database.

    I'm using TopHat 1.0.13, with Bowtie 0.12.5 (or 0.12.3), on a Linux x86_64 computation cluster with a Lustre filesystem. The data I'm using are single-end, 76bp long reads. I'm running TopHat with the following parameters:

    -p 1 -a 8 -i 40 -m 1 -I 1000000 -F 0 --coverage-search --microexon-search

    For one of the runs where I get incomplete results, I had noticed this weird thing in the output:

    [Thu May 27 17:25:49 2010] Mapping reads against segment_juncs with Bowtie
    [Thu May 27 17:25:50 2010] Mapping reads against segment_juncs with Bowtie
    [Thu May 27 17:25:51 2010] Mapping reads against segment_juncs with Bowtie

    The weird thing is that mapping the reads against segment_juncs should take a lot more time, since I have about 20 million reads. So I thought that there might be an error in building the bowtie index for the splice junctions, but the bowtie_build.log shows no error. However, I find the following type of errors in some other log files from the run:

    ############################################

    filebd4xji.log

    Error reading ebwt array: returned 41750080, length was 168445184
    Your index files may be corrupt; please try re-building or re-downloading.
    A complete index consists of 6 files: XYZ.1.ebwt, XYZ.2.ebwt, XYZ.3.ebwt,
    XYZ.4.ebwt, XYZ.rev.1.ebwt, and XYZ.rev.2.ebwt. The XYZ.1.ebwt and
    XYZ.rev.1.ebwt files should have the same size, as should the XYZ.2.ebwt and
    XYZ.rev.2.ebwt files.

    ############################################

    So it seems that even though the Bowtie index for the junction sequences was built correctly, the alignment of reads on the junction index fails. I've run several series of tests, and I found that this Bowtie error does not occur all the times (it seems to be more or less random), but it does seem to be quite frequent for large datasets. It is not clear yet why this happens - it might be OS-specific or filesystem-specific - so I am currently testing several solutions to fix this problem (see also parallel thread "Bowtie can't read index files").

    However, the bigger issue here is that TopHat does not catch the error thrown by Bowtie, and finishes with apparent success, while giving only an incomplete set of exon-exon junctions. This is quite dangerous, since most users will not search for "Error" messages in the log files if TopHat has finished successfully. So I would advise TopHat users to check the log files for Bowtie errors before proceeding with their analyses.

    Any comments or suggestions on how to solve this problem would be much appreciated.

    Best wishes,

    Anamaria

  • #2
    Originally posted by anecsulea View Post
    Hi,

    I've recently discovered a strange behaviour in TopHat: it can sometimes give incomplete (or even incorrect) results, due to an error while running Bowtie on the junction sequence database.

    I'm using TopHat 1.0.13, with Bowtie 0.12.5 (or 0.12.3), on a Linux x86_64 computation cluster with a Lustre filesystem. The data I'm using are single-end, 76bp long reads. I'm running TopHat with the following parameters:

    -p 1 -a 8 -i 40 -m 1 -I 1000000 -F 0 --coverage-search --microexon-search

    For one of the runs where I get incomplete results, I had noticed this weird thing in the output:

    [Thu May 27 17:25:49 2010] Mapping reads against segment_juncs with Bowtie
    [Thu May 27 17:25:50 2010] Mapping reads against segment_juncs with Bowtie
    [Thu May 27 17:25:51 2010] Mapping reads against segment_juncs with Bowtie

    The weird thing is that mapping the reads against segment_juncs should take a lot more time, since I have about 20 million reads. So I thought that there might be an error in building the bowtie index for the splice junctions, but the bowtie_build.log shows no error. However, I find the following type of errors in some other log files from the run:

    ############################################

    filebd4xji.log

    Error reading ebwt array: returned 41750080, length was 168445184
    Your index files may be corrupt; please try re-building or re-downloading.
    A complete index consists of 6 files: XYZ.1.ebwt, XYZ.2.ebwt, XYZ.3.ebwt,
    XYZ.4.ebwt, XYZ.rev.1.ebwt, and XYZ.rev.2.ebwt. The XYZ.1.ebwt and
    XYZ.rev.1.ebwt files should have the same size, as should the XYZ.2.ebwt and
    XYZ.rev.2.ebwt files.

    ############################################

    So it seems that even though the Bowtie index for the junction sequences was built correctly, the alignment of reads on the junction index fails. I've run several series of tests, and I found that this Bowtie error does not occur all the times (it seems to be more or less random), but it does seem to be quite frequent for large datasets. It is not clear yet why this happens - it might be OS-specific or filesystem-specific - so I am currently testing several solutions to fix this problem (see also parallel thread "Bowtie can't read index files").

    However, the bigger issue here is that TopHat does not catch the error thrown by Bowtie, and finishes with apparent success, while giving only an incomplete set of exon-exon junctions. This is quite dangerous, since most users will not search for "Error" messages in the log files if TopHat has finished successfully. So I would advise TopHat users to check the log files for Bowtie errors before proceeding with their analyses.

    Any comments or suggestions on how to solve this problem would be much appreciated.

    Best wishes,

    Anamaria
    This is an interesting bug - thanks for reporting it. There is code to check that the call to bowtie-build succeeded and that the index is good (or at least passes bowtie-build's internal checks), but for some reason that code is not catching the exception. I'll look into it further.

    Can you re-run this with --keep-tmp enabled, and then try to run the bowtie-build step listed in run.log manually? If that step is failing (some or all of the time), you might want to check the size of the juncs_db.fa file that TopHat generates and feeds to bowtie-build. I'm curious as to how big it is and/or whether it's corrupt in some way.

    Comment


    • #3
      Originally posted by Cole Trapnell View Post
      This is an interesting bug - thanks for reporting it. There is code to check that the call to bowtie-build succeeded and that the index is good (or at least passes bowtie-build's internal checks), but for some reason that code is not catching the exception. I'll look into it further.

      Can you re-run this with --keep-tmp enabled, and then try to run the bowtie-build step listed in run.log manually? If that step is failing (some or all of the time), you might want to check the size of the juncs_db.fa file that TopHat generates and feeds to bowtie-build. I'm curious as to how big it is and/or whether it's corrupt in some way.
      As far as I can see, there is no reason why the code that checks that bowtie-build succeeded should catch this exception, since the error does not come from bowtie-build, but from the bowtie aligner. Indeed, as I explained above, the index is built correctly and is definitely not corrupt, yet bowtie fails to read it into memory when trying to align the reads. This issue is discussed into more detail in a parallel thread in this forum ("Bowtie fails to read index files"), and I have managed to find a solution that works on the computation cluster that I'm using. However, I still believe that the fact that TopHat does not catch this error is a serious problem, and needs to be corrected in future versions of the software.

      Best wishes,

      Anamaria

      Comment


      • #4
        Originally posted by anecsulea View Post
        As far as I can see, there is no reason why the code that checks that bowtie-build succeeded should catch this exception, since the error does not come from bowtie-build, but from the bowtie aligner. Indeed, as I explained above, the index is built correctly and is definitely not corrupt, yet bowtie fails to read it into memory when trying to align the reads. This issue is discussed into more detail in a parallel thread in this forum ("Bowtie fails to read index files"), and I have managed to find a solution that works on the computation cluster that I'm using. However, I still believe that the fact that TopHat does not catch this error is a serious problem, and needs to be corrected in future versions of the software.

        Best wishes,

        Anamaria
        OK - I see where things are going awry. It sounds like from the parallel thread that your filesystem/OS is interacting with Bowtie in a way that's producing the failure. A recent version of TopHat streamlined the way Bowtie is called, and it looks like I failed to put back some of the exception handling code. It's there now and will be present in the next release.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Advancing Precision Medicine for Rare Diseases in Children
          by seqadmin




          Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
          12-16-2024, 07:57 AM
        • seqadmin
          Recent Advances in Sequencing Technologies
          by seqadmin



          Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

          Long-Read Sequencing
          Long-read sequencing has seen remarkable advancements,...
          12-02-2024, 01:49 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 12-17-2024, 10:28 AM
        0 responses
        26 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 12-13-2024, 08:24 AM
        0 responses
        42 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 12-12-2024, 07:41 AM
        0 responses
        28 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 12-11-2024, 07:45 AM
        0 responses
        42 views
        0 likes
        Last Post seqadmin  
        Working...
        X