Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • tophat IndexError: string index out of range

    I'm using tophat for alignment of Illumina rna-seq data. The following error emerged during one my previous runs but I've already seen this type of error several times.

    [Tue Jan 17 00:17:19 2012] Beginning TopHat run (v1.4.0)
    -----------------------------------------------
    [Tue Jan 17 00:17:19 2012] Preparing output location top_bwa_out/
    [Tue Jan 17 00:17:19 2012] Checking for Bowtie index files
    [Tue Jan 17 00:17:19 2012] Checking for reference FASTA file
    [Tue Jan 17 00:17:19 2012] Checking for Bowtie
    Bowtie version: 0.12.7.0
    [Tue Jan 17 00:17:19 2012] Checking for Samtools
    Samtools Version: 0.1.17
    [Tue Jan 17 00:17:19 2012] Generating SAM header for hs_grch37
    Traceback (most recent call last):
    File "/usr/local/bin/tophat-1.4.0.Linux_x86_64/tophat", line 3052, in <module>
    sys.exit(main())
    File "/usr/local/bin/tophat-1.4.0.Linux_x86_64/tophat", line 2925, in main
    params.read_params = check_reads_format(params, reads_list)
    File "/usr/local/bin/tophat-1.4.0.Linux_x86_64/tophat", line 1318, in check_reads_format
    freader=FastxReader(zf.file, params.read_params.color, zf.fname)
    File "/usr/local/bin/tophat-1.4.0.Linux_x86_64/tophat", line 1073, in __init__
    while hlines>0 and self.lastline[0] not in "@>" :
    IndexError: string index out of range

    To me, it seems that my reference input files are ok.

    Did someone else see this kind of error?

    Thanks
    Wolfgang

  • #2
    I am getting this problem too. Did you ever solve it? If you did, any advice on how you did it would be great.

    Thanks,

    Comment


    • #3
      Dear fongchun,
      thank you for your answer. Unfortunately, the issue is not solved.
      To me, it looks like a bug. Tophat seems to parse the ref-sequence file for the names (and lengths) of referencs sequences. Probably it reaches end of file without recognizing.

      Wolfgang

      Comment


      • #4
        Originally posted by wokai001 View Post
        Dear fongchun,
        thank you for your answer. Unfortunately, the issue is not solved.
        To me, it looks like a bug. Tophat seems to parse the ref-sequence file for the names (and lengths) of referencs sequences. Probably it reaches end of file without recognizing.

        Wolfgang
        Hi Wolfgang,

        Although we were not able to solve out problem completely, we realized that several of our fastq files were corrupt due to poor sequencing runs, and once they were removed from the analysis, the error seemed to disappear. I wonder if you could be facing a similar issue?

        Fong

        Comment


        • #5
          Corrupted fastq?

          Dear Fong,

          unfortunately, I can't look up because I deleted the input fastq file. I first aligned my reads with bwa and then extracted unmapped reads with samtools (into a bam file):

          samtools view -f 0x4 -b -S -o unmap_inh017.bam inh017.sam

          The bam was then converted into fastq with bam2fastx (which is part of tophat):

          bam2fastx --fastq unmap_inh017.bam > unmap_inh017.fq

          The error message looks like it misses a "@" or ">" in the last line. What did your deleted reads look like? Perhaps one could build something like a format-checker (I would try when I could see some faulty reads).

          Cheers
          Wolfgang

          Comment


          • #6
            I had the same error when running tophat for my fastq files.
            I fixed this error; it happened only when my input fastq files were empty.

            An index error occurs in
            while hlines>0 and self.lastline[0] not in "@>" :
            only because self.lastline is an empty string.

            This is because self.lastline is the result of file.readline in the while loop,
            which is "" when the file itself is empty.

            After putting in a decent input file this error disappeared.

            Hope this helps

            Comment


            • #7
              Just info: In Python, a string is a single-dimensional array of characters. The string index out of range means that the index you are trying to access does not exist. In a string, that means you're trying to get a character from the string at a given point. If that given point does not exist , then you will be trying to get a character that is not inside of the string. Indexes in Python programming start at 0. This means that the maximum index for any string will always be length-1. There are several ways to account for this. Knowing the length of your string (using len() function)could certainly help you to avoid going over the index.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              8 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              8 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              49 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              66 views
              0 likes
              Last Post seqadmin  
              Working...
              X