Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bowtie2 Saw ASCII character 7 but expected 33-based Phred qua

    I'm running bowtie2 on a windows 7, 64 bit machine. My reads are prokaryotic paired end reads derived from an Illumina HiSeq instrument

    below are the outputs for 2 alignments:

    Result1:

    C:\bowtie2>perl bowtie2 --very-fast-local -t -p2 -x genomeindex -1 R:\OperonNGSdata\run_data\A-1_GGACCC_L008_R1_001.fastq -2 R:\OperonNGSdata\run_data\A-1_GGACCC_L008_R2_001.fastq -S A1_Alignment.sam
    Time loading reference: 00:00:00
    Time loading forward index: 00:00:00
    Time loading mirror index: 00:00:00
    Multiseed full-index search: 02:06:00
    24462238 reads; of these:
    24462238 (100.00%) were paired; of these:
    1566890 (6.41%) aligned concordantly 0 times
    5594011 (22.87%) aligned concordantly exactly 1 time
    17301337 (70.73%) aligned concordantly >1 times
    ----
    1566890 pairs aligned concordantly 0 times; of these:
    156916 (10.01%) aligned discordantly 1 time
    ----
    1409974 pairs aligned 0 times concordantly or discordantly; of these:
    2819948 mates make up the pairs; of these:
    2064379 (73.21%) aligned 0 times
    45713 (1.62%) aligned exactly 1 time
    709856 (25.17%) aligned >1 times
    95.78% overall alignment rate
    Time searching: 02:06:00
    Overall time: 02:06:00


    Result2 :

    C:\bowtie2>perl bowtie2 --very-fast-local -t -p2 -x genomeindex -1 R:\OperonNGSdata\run_data\A-2_TTCAGC_L008_R1_001.fastq -2 R:\OperonNGSdata\run_data\A-2_TTCAG
    C_L008_R2_001.fastq -S A2_Alignment.sam
    Time loading reference: 00:00:00
    Time loading forward index: 00:00:01
    Time loading mirror index: 00:00:00
    Saw ASCII character 7 but expected 33-based Phred qual.
    terminate called after throwing an instance of 'int'

    This application has requested the Runtime to terminate it in an unusual way.
    Please contact the application's support team for more information.
    bowtie2-align exited with value 255


    As can be seen, the first run completes successfully, while the second one quits unexpectedly after throwing an error. This is confusing since the samples are replicates processed in exactly the same way on the same instrument and (presumably) identically processed according to the latest version of the Illumina pipeline (version ??).
    Last edited by swatve3; 02-16-2014, 12:35 PM.

  • #2
    It could be the second FASTQ file is corrupt (there shouldn't be an ASCII char 7 in it). I would start by checking the contents of the FASTQ looking for this bad character (can you program? This would be easy in Perl/Python/Ruby etc).

    You could also check the MD5 checksum (if available), or try re-downloading the FASTQ file in case it was a network glitch or hard drive error to blame.

    Comment


    • #3
      Thanks for replying.

      Some more info:

      1) I've mapped these files to the genome using CLC genomics workbench before without any glitch
      2) I cannot program in Perl/Python/Ruby without going through a lot of pain and anguish
      3) I'll try to download the file again and try a second run to see if that works

      Comment


      • #4
        Perhaps CLC is more tolerant of a malformed fastq record and skips it.

        You can use one of these scripts to see if they can find the bad fastq record.

        UPDATED (Sun Feb 19 14:56:28 PST 2012) High-throughput sequencing (HTS) is rapidly advancing our ability to understand how the genome responds to its environment.  It also presents a challenge to t…

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        59 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        57 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        51 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        55 views
        0 likes
        Last Post seqadmin  
        Working...
        X