Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • spabinger
    Member
    • Jun 2011
    • 13

    Duplicate read names - BWA mem - paired reads have different names

    Hi,

    running BWA mem (- PE; - Illumina), I'm getting the following error (replaced the ids):



    [mem_sam_pe] paired reads have different names: "XXX:5:YYY:1:11102:4257:13510", "XXX:5:YYY:1:11102:15792:1058"

    I checked the fastq file and found out that each read name is duplicated 7 times in the file (exact same name). However, the order of the read names is not matching between the pairs (see bold positions).

    Example:

    > grep -n "XXX:5:YYY:1:11102:4257:13510" R1.fastq
    761397:@XXX:5:YYY:1:11102:4257:13510 1:N:0:AGGCAGAA+GCGATCTA
    862085:@XXX:5:YYY:1:11102:4257:13510 1:N:0:AGGCAGAA+GCGATCTA
    962773:@XXX:5:YYY:1:11102:4257:13510 1:N:0:AGGCAGAA+GCGATCTA
    1063461:@XXX:5:YYY:1:11102:4257:13510 1:N:0:AGGCAGAA+GCGATCTA
    1164149:@XXX:5:YYY:1:11102:4257:13510 1:N:0:AGGCAGAA+GCGATCTA
    1264837:@XXX:5:YYY:1:11102:4257:13510 1:N:0:AGGCAGAA+GCGATCTA
    1365525:@XXX:5:YYY:1:11102:4257:13510 1:N:0:AGGCAGAA+GCGATCTA

    > grep "XXX:5:YYY:1:11102:4257:13510" R2.fastq
    761397:@XXX:5:YYY:1:11102:4257:13510 2:N:0:AGGCAGAA+GCGATCTA
    862085:@XXX:5:YYY:1:11102:4257:13510 2:N:0:AGGCAGAA+GCGATCTA
    1028309:@XXX:5:YYY:1:11102:4257:13510 2:N:0:AGGCAGAA+GCGATCTA
    1063461:@XXX:5:YYY:1:11102:4257:13510 2:N:0:AGGCAGAA+GCGATCTA
    1229685:@XXX:5:YYY:1:11102:4257:13510 2:N:0:AGGCAGAA+GCGATCTA
    1264837:@XXX:5:YYY:1:11102:4257:13510 2:N:0:AGGCAGAA+GCGATCTA
    1365525:@XXX:5:YYY:1:11102:4257:13510 2:N:0:AGGCAGAA+GCGATCTA


    Is it ok for a fastq file to have multiple reads with the same read name?
    If not, could this be a problem of BCL conversion?
    How can I fix it?


    Thanks for your help,
    Stephan


    PS: bwa mem command:

    bwa mem -t 40 -v 1 hg19.fa R1.fastq R2.fastq > aln.sam
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    #2
    Fastq headers should always start with an "@" so what you have is not following the standard. Have you asked the folks who gave you this data as to whether it has been post-processed in some way? And there should be no duplicates (let alone multiples) in raw sequence files, as far as the fastq header ID's are concerned.
    Last edited by GenoMax; 02-02-2016, 06:44 AM.

    Comment

    • spabinger
      Member
      • Jun 2011
      • 13

      #3
      Hi,

      that's not the problem. See "head" result (Sequence and quality trimmed) and also the grep result I posted.

      > head R1.fastq
      @XXX:5:YYY:1:11101:12923:1051 1:N:0:AGGCAGAA+NCGATCTA
      CTT...TTC
      +
      AAA...</<
      @XXX:5:YYY:1:11101:4797:1055 1:N:0:AGGCAGAA+NCGATCTA
      ACC...CTA
      +
      AAA...<A/


      Thanks,
      Stephan

      Comment

      • GenoMax
        Senior Member
        • Feb 2008
        • 7142

        #4
        My apologies.

        If the order of the reads in your files is messed up then you can "re-pair" the order of reads using the repair tool from BBMap suite like follows:

        Code:
        $ repair.sh in1=r1.fq in2=r2.fq out1=fixed1.fq out2=fixed2.fq outsingle=singletons.fq
        That said each fastq sequence header should be unique in every sequence file. If that is not the case then there is something wrong with this data.

        Comment

        • spabinger
          Member
          • Jun 2011
          • 13

          #5
          Thanks for you reply.

          I was also suspecting that the raw file is not ok.

          Best regards,
          Stephan

          Comment

          • GenoMax
            Senior Member
            • Feb 2008
            • 7142

            #6
            If the sequence/Q-scores are identical for those 7 copies then you could potentially keep just one and throw away other 6.

            I am puzzled by how this could have happened though. No logical explanation comes to mind.

            Comment

            • danieleyumi
              Junior Member
              • Jun 2011
              • 1

              #7
              It happened to me twice and a new demultiplexing fixed the problem. I suspect there is something to do with the number of threads to write fastq data. Best, Daniele

              Comment

              Latest Articles

              Collapse

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, Yesterday, 10:09 AM
              0 responses
              10 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-04-2026, 08:59 AM
              0 responses
              18 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-02-2026, 12:03 PM
              0 responses
              26 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-02-2026, 11:40 AM
              0 responses
              21 views
              0 reactions
              Last Post SEQadmin2  
              Working...