Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bowtie error when mapping ABI RNA-seq data with Tophat

    Hi all,
    I got errors like below:

    No,1
    =============
    2012-07-26 16:37:43] Beginning TopHat run (v2.0.4)
    -----------------------------------------------
    [2012-07-26 16:37:43] Checking for Bowtie
    Bowtie version: 0.12.8.0
    [2012-07-26 16:37:44] Checking for Samtools
    Samtools version: 0.1.13.0
    [2012-07-26 16:37:44] Checking for Bowtie index files
    [2012-07-26 16:37:44] Checking for reference FASTA file
    Warning: Could not find FASTA file hg19_c.fa
    [2012-07-26 16:37:44] Reconstituting reference FASTA file from Bowtie index
    Executing: /home/xiaoyu/bin/bowtie-inspect hg19_c > VVinfect0h/tmp/hg19_c.fa
    [2012-07-26 16:42:49] Generating SAM header for hg19_c
    format: fastq
    quality scale: phred33 (default)
    [2012-07-26 16:43:42] Reading known junctions from GTF file
    [2012-07-26 16:43:54] Preparing reads
    left reads: min. length=50, max. length=50, 49053373 kept reads (712587 discarded)
    [2012-07-26 17:00:17] Creating transcriptome data files..
    [2012-07-26 17:01:36] Building Bowtie index from hg19UCSC.fa
    [2012-07-26 17:53:34] Mapping left_kept_reads to transcriptome hg19UCSC with Bowtie
    [FAILED]
    Error running bowtie:
    Too few quality values for read: 460T3#
    are you sure this is a FASTQ-int file?
    Command: /home/xiaoyu/bin/bowtie -q -C --col-keepends -v 1 -k 60 -m 60 -S -p 2 --sam-nohead --max /dev/null VVinfect0h/tmp/hg19UCSC -


    No2
    ==============


    [2012-07-26 16:43:15] Beginning TopHat run (v2.0.4)
    -----------------------------------------------
    [2012-07-26 16:43:15] Checking for Bowtie
    Bowtie version: 0.12.8.0
    [2012-07-26 16:43:15] Checking for Samtools
    Samtools version: 0.1.13.0
    [2012-07-26 16:43:15] Checking for Bowtie index files
    [2012-07-26 16:43:15] Checking for reference FASTA file
    Warning: Could not find FASTA file hg19_c.fa
    [2012-07-26 16:43:15] Reconstituting reference FASTA file from Bowtie index
    Executing: /home/xiaoyu/bin/bowtie-inspect hg19_c > VVinfect4h/tmp/hg19_c.fa
    [2012-07-26 16:48:16] Generating SAM header for hg19_c
    format: fastq
    quality scale: phred33 (default)
    [2012-07-26 16:48:22] Reading known junctions from GTF file
    [2012-07-26 16:48:33] Preparing reads
    left reads: min. length=50, max. length=50, 34854552 kept reads (1262790 discarded)
    [2012-07-26 16:59:38] Creating transcriptome data files..
    [2012-07-26 17:01:05] Building Bowtie index from hg19UCSC.fa
    [2012-07-26 17:51:34] Mapping left_kept_reads to transcriptome hg19UCSC with Bowtie
    [FAILED]
    Error running bowtie:
    Too few quality values for read: 51300T13
    are you sure this is a FASTQ-int file?
    Command: /home/xiaoyu/bin/bowtie -q -C --col-keepends -v 1 -k 60 -m 60 -S -p 2 --sam-nohead --max /dev/null VVinfect4h/tmp/hg19UCSC -


    How to fix it? Is it possible to trimm the reads "460T3#" "51300T13" out? If yes, how? Please help.
    Last edited by HSV-1; 07-26-2012, 04:51 AM. Reason: ask more

  • #2
    This happen to me one time when using -C with an illumina dataset. Are you sure your reads are in color space ?
    Cheers

    Comment


    • #3
      Originally posted by goudurix View Post
      This happen to me one time when using -C with an illumina dataset. Are you sure your reads are in color space ?
      Cheers


      The data is from ABI-Solid. And I open the data file there are no ACGTs,but 1,2,3,...
      They are color space.

      Comment


      • #4
        Got the same error here when feeding tophat2 with csfastq files as the input.

        I checked the csfastq file, and found nothing wrong with it. No truncated reads or qual values. Then I tried bowtie to align the reads to the reference genome using the csfastq file as the input - bowtie finished without any error, and over 90% of the reads were mapped.

        Try feeding tophat2 with csfasta+qual files as the input instead of csfastq. I tried that and tophat2 ran through successfully.
        Last edited by sonia.bao; 08-01-2012, 12:07 AM.

        Comment


        • #5
          Originally posted by sonia.bao View Post
          Got the same error here when feeding tophat2 with csfastq files as the input.

          I checked the csfastq file, and found nothing wrong with it. No truncated reads or qual values. Then I tried bowtie to align the reads to the reference genome using the csfastq file as the input - bowtie finished without any error, and over 90% of the reads were mapped.

          Try feeding tophat2 with csfasta+qual files as the input instead of csfastq. I tried that and tophat2 ran through successfully.
          Thanks for your reply. How to get csfasta files and qual files from the same csfastq?

          Comment


          • #6
            Try this python script - it takes color space .fastq file as the input and outputs 2 files, .csfasta and .QV.qual.

            (It was not written by me. Someone wrote this script and shared it on this board (much appreciated!!!). If anybody knows who the author is, please let me know and I'll update it)

            csfastq2solid.py
            Code:
            import sys
            fq = sys.argv[1]
            
            base = fq.split(".fastq")[0]
            quals = open(base + ".QV.qual", "w")
            seq = open(base + ".csfasta", "w")
            
            for i, line in enumerate(open(fq)):
            
                mod = i % 4
                if mod == 0: # name
                    assert line[0] == "@"
                    quals.write(">" + line[1:])
                    seq.write(">" + line[1:])
                elif mod == 1: # cseq
                    seq.write(line)
                elif mod == 3:
                    print >>quals, " ".join((str(ord(q) - 33) for q in line.rstrip("\r\n")))
            
            seq.close(); quals.close()
            print >>sys.stderr, "wrote %s, %s" % (quals.name, seq.name)
            Last edited by sonia.bao; 08-01-2012, 01:04 AM. Reason: Added description

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            22 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            24 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            19 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            50 views
            0 likes
            Last Post seqadmin  
            Working...
            X