Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Old(?) bowtie file: "missing quality values" error in tophat

    Hello everyone,

    I'm trying to look at some old RNA-seq data that I was able to find on NCBI. The data is available as a bowtie output, and I'm trying to use tophat2 to get transcript data.

    It originally looked something like this:

    HWI-EAS283:1:1:4:1142#0/1 - chr2 70362272 TANTCNTTCCAAGGCTTCTAACATGATGATACTATTTCCTCG B9%<'%<B2;?ACA*B@/BB@;BCCCBBBC@BBCBBA=CB7B 2 36:G>N,39:C>N
    HWI-EAS283:1:1:4:1142#0/1 - chr18 50187254 TANTCNTTCCAAGGCTTCTAACATGATGATACTATTTCCTCG B9%<'%<B2;?ACA*B@/BB@;BCCCBBBC@BBCBBA=CB7B 2 36:G>N,39:C>N

    Tophat gave the following error:

    Traceback (most recent call last):
    File "/opt/local/bin/tophat", line 2346, in <module>
    sys.exit(main())
    File "/opt/local/bin/tophat", line 2251, in main
    params.read_params = check_reads(params.read_params, reads_list)
    File "/opt/local/bin/tophat", line 1063, in check_reads
    if first_line[0] in "@>":
    IndexError: string index out of range


    So I figured it must be the lack of an '@' at the beginning of the name of the reads, so I used vim to add an @ to the beginning of every line:


    @HWI-EAS283:1:1:4:1142#0/1 - chr2 70362272 TANTCNTTCCAAGGCTTCTAACATGATGATACTATTTCCTCG B9%<'%<B2;?ACA*B@/BB@;BCCCBBBC@BBCBBA=CB7B 2 36:G>N,39:C>N
    @HWI-EAS283:1:1:4:1142#0/1 - chr18 50187254 TANTCNTTCCAAGGCTTCTAACATGATGATACTATTTCCTCG B9%<'%<B2;?ACA*B@/BB@;BCCCBBBC@BBCBBA=CB7B 2 36:G>N,39:C>N


    Now when I run this, I get the following error, where '###' is the file path, sorry wanted to keep that private :


    Error encountered parsing file /#############:
    Premature end of file (missing quality values for HWI-EAS283:1:1:4:1142#0/1 - chr70362272 TANTCNTTCCAAGGCTTCTAACATGATGATACTATTTCCTCG B9%<'%<B2;?ACA*B@/BB@;BCCCBBBC@BBCBBA=CB7B 2 36:G>N,39:C>N)


    This is the very first line, so it seems to hint at a format error...

    I looked at the bowtie manual, and it seems my output differs in one way from the manual's: column 5 should be "read sequence", a '+' or '-' value, not the quality score - meaning in between the read and the quality of my output should be another column with a '+' or '-' value.

    Am I missing something here? The bowtie output that I'm downloading looks "mostly" like a bowtie output, but it appears wrong... I tried to see if it was maybe an older format, but I can't find any info on that.

    Can anybody help me out?

    Thanks in advance!!

    -worm_picker

  • #2
    I don't recognize the format; maybe it was some old bowtie-specific output. If you want to map that, you should convert into fastq format, like this:
    @1
    5
    +
    6

    ...where 1 is the first field (read name), 5 is the 5th field (bases), and 6 is the 6th field (qualities).

    Comment


    • #3
      Thanks for replying, brian.

      It's the same as bowtie output normally, and is already mapped, but is just missing a column (I think). On the GEO accession page it claims to be mapped reads from bowtie.

      bowtie should be (according to the manual):
      1. name
      2. strand
      3. "contig"
      4. 0-based start on contig
      5. read
      6. read strand
      7. quality
      8. mismatches (if any)

      Comment


      • #4
        Well, if you want to map it with Tophat, I think you'll have to convert it to fastq first, even if it is (almost) in an old Bowtie format. I doubt you will find any downstream RNA-seq analysis tools that accept those mappings; they generally require sam or bam.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        18 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        22 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        16 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        47 views
        0 likes
        Last Post seqadmin  
        Working...
        X