Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BWA error

    Hello everyone,

    I used BWA for mapping and received the following error message:
    "Parse error at line 34: sequence and quality are inconsistent"
    It is a paired end run. The mapping of forward and reverse reads worked without problems. The error message occurs when I use sampe. What could be the mistake? I am gratefull for any ideas.

    Thanks Robby

  • #2
    Have you looked at the input FASTQ file, especially at the area around line 34?

    Data can get corrupted in transfer...

    Comment


    • #3
      What are the commands you use?
      Is it in the "aln" program or the subsequent "sampe" program?
      Did the aln finish?
      Is there junk in the intermediate "sai" file? ( run "strings filename.sai | less" and glance for readable error messages). It's easy to accidentally send stderr to stdout in the aln phase and use this to pump into "sampe". bwa is merciless in expecting correct input.

      Comment


      • #4
        Thanks a lot for your answers.

        I checked the fastq-file, but couldn't see any strange data. But I am not really sure, what I have to check exactly. So I had just a look, if the sequence length is the same as the quality scores and if the line breaks are correct. But everything seems to be OK. The onliest difference to other fastq-files is, that in the third line of each sequence is just a "+" instead of "+" and sequence name. But I thought that this is maybe due to the new Illumina format. Does anyone have more information regarding that line?

        The error message occurs in the "sampe" program. The "aln program finished without any error message. I had a look into the sai-files, but I don't understand the output. But at least I couldn't find an error message in these files and the files have the expected size.

        This was the first run with the new Illumina v3 chemistry and the new fastq format (i.e. Sanger quality scores). Do I have to change in that case anything?

        I used the following commands:
        bwa aln -I -l 35 ref.fasta reads1.fastq.gz > align1.sai
        bwa aln -I -l 35 ref.fasta reads2.fastq.gz > align2.sai
        bwa sampe ref.fasta align1.sai align2.sai reads1.fastq.gz reads2.fastq.gz > mapping.sam

        I tried the aln-commands without the -I option (for Illumina 1.3+) as well, but received the same error message.

        Comment


        • #5
          Originally posted by Robby View Post
          I checked the fastq-file, but couldn't see any strange data. But I am not really sure, what I have to check exactly. So I had just a look, if the sequence length is the same as the quality scores and if the line breaks are correct. But everything seems to be OK. The onliest difference to other fastq-files is, that in the third line of each sequence is just a "+" instead of "+" and sequence name. But I thought that this is maybe due to the new Illumina format. Does anyone have more information regarding that line?
          That's fine, the read name is not needed at the third line, just a "+" is valid fastq and recognized by BWA.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin


            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
            Yesterday, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          44 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          43 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          38 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          55 views
          0 likes
          Last Post seqadmin  
          Working...
          X