Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tophat prep_reads error

    Has anyone seen this error before? I had Tophat working fine, but wanted to use the newly provided Illumina bowtie_index and I'm now getting this error:

    [Thu Aug 4 13:56:11 2011] Beginning TopHat run (v1.3.0)
    -----------------------------------------------
    [Thu Aug 4 13:56:11 2011] Preparing output location /m/illumina/tophat/
    [Thu Aug 4 13:56:11 2011] Checking for Bowtie index files
    [Thu Aug 4 13:56:11 2011] Checking for reference FASTA file
    [Thu Aug 4 13:56:11 2011] Checking for Bowtie
    Bowtie version: 0.12.7.0
    [Thu Aug 4 13:56:11 2011] Checking for Samtools
    Samtools Version: 0.1.8
    [Thu Aug 4 13:56:11 2011] Generating SAM header for /m/ref/genome
    [Thu Aug 4 13:56:40 2011] Preparing reads
    format: fastq
    quality scale: phred33 (default)
    [FAILED]
    Error retrieving prep_reads info.

    OR!!! In two of my samples, I get this error message:

    Error: qual length (103) differs from seq length (63) for fastq record !

    In case it matters, there is a slice of my pre-maq ill2sanger .txt file and post-maq .fastq file attached.

    My apologies if this has been answered before.
    Attached Files

  • #2
    Same thing here. I have chmod 777 and 1TB free on the tophat_out destination, I have rebuilt my reference genome several times, bowtie runs perfectly, and the input files have been successfully aligned by tophat on another machine. Here's my output :

    [Wed Jan 4 16:20:51 2012] Beginning TopHat run (v1.3.3)
    -----------------------------------------------
    [Wed Jan 4 16:20:51 2012] Preparing output location /media/data/grimmer/tophat_out//
    [Wed Jan 4 16:20:51 2012] Checking for Bowtie index files
    [Wed Jan 4 16:20:51 2012] Checking for reference FASTA file
    [Wed Jan 4 16:20:51 2012] Checking for Bowtie
    Bowtie version: 0.12.7.0
    [Wed Jan 4 16:20:51 2012] Checking for Samtools
    Samtools Version: 0.1.18
    [Wed Jan 4 16:20:51 2012] Generating SAM header for /media/data/genomes/hg19.ebwt/hg19
    [Wed Jan 4 16:20:52 2012] Preparing reads
    format: fastq
    quality scale: phred33 (default)
    [FAILED]
    Error retrieving prep_reads info.

    Comment


    • #3
      Strangely enough I have got this error for the first time ..

      [2012-04-30 16:01:30] Preparing reads
      [FAILED]
      Error running 'prep_reads'
      terminate called after throwing an instance of 'int'


      -Abhi

      Full output below

      [2012-04-30 16:01:23] Beginning TopHat run (v2.0.0)
      -----------------------------------------------
      [2012-04-30 16:01:23] Checking for Bowtie
      Bowtie version: 2.0.0.5
      [2012-04-30 16:01:23] Checking for Samtools
      Samtools version: 0.1.18.0
      [2012-04-30 16:01:23] Checking for Bowtie index files
      [2012-04-30 16:01:23] Checking for reference FASTA file
      Warning: Could not find FASTA file ../../../reference/Chlamy_V5/reference_bowtie2/Chalmy_reinhardtii_110311_v5.fasta.fa
      [2012-04-30 16:01:23] Reconstituting reference FASTA file from Bowtie index
      Executing: /jgi/tools/bin/bowtie2-inspect ../../../reference/Chlamy_V5/reference_bowtie2/Chalmy_reinhardtii_110311_v5.fasta > tophat_v2_out/tmp/Chalmy_reinhardtii_110311_v5.fasta.fa
      [2012-04-30 16:01:29] Generating SAM header for ../../../reference/Chlamy_V5/reference_bowtie2/Chalmy_reinhardtii_110311_v5.fasta
      format: fastq
      quality scale: phred64 (reads generated with GA pipeline version >= 1.3)
      [2012-04-30 16:01:30] Preparing reads
      [FAILED]
      Error running 'prep_reads'
      terminate called after throwing an instance of 'int'

      Comment


      • #4
        Prep-read errors: Tophat

        I have attached the first 20 lines of Illumina sequence reads (sample.txt), these are 100 base single end reads. These are from Illumina's new pipeline (Version 1.8). The sequences are in fastq format. All sequences (raw and pass filter reads) are in a single file. Quality scores are in Sanger FASTQ format. The offset is ascii 33, instead of the previous Illumina Q score offset (ascii 64).

        I ran a trial tophat with this culled out sequence.

        Command used:
        Code:
        tophat -o ./tophat_out_test_2 -p4 --segment-length 50 --solexa1.3-quals  ../Genome/genome-index P1-test
        I get the following error:
        ...................
        format: fastq
        quality scale: phred64 (reads generated with GA pipeline version >= 1.3)
        [2012-12-05 16:22:18] Preparing reads
        [FAILED]
        Error running 'prep_reads'
        terminate called after throwing an instance of 'int'
        ...................


        I am not sure if this is because of something in the header file or because of the different quality scores. Can any body help me figure this out?

        Thanks
        Attached Files

        Comment


        • #5
          Illumina V 1.8 and Solexa quality

          In Illumina V 1.8 the reads quality scores are in Sanger Fastq format (off set of ASCII33 and not Phred 64) so using
          Code:
           --solexa-quals
          instead of
          Code:
          --solexa1.3-quals
          seems to work! I am still not sure if this is OK though.

          Full command that now works:
          Code:
          tophat -o ./tophat_out_test_2 -p4 --segment-length 50 --solexa-quals  ../Genome/genome-index P1-test

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          59 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          57 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          53 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          56 views
          0 likes
          Last Post seqadmin  
          Working...
          X