Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Most HiSeq reads demultiplexing as 'Undetermined'; did seq facility screw up?

    We sent a library of multiplexed samples prepped with the KAPA HyperCap workflow for sequencing in a single lane of a HiSeq 4000. To our horror, the vast majority (>90%) came back as undetermined.

    All of our barcodes are 8 bp, dual index. Here are a few examples from our R1 undetermined FASTQ file:

    @E00593:415:HFLKYCCX2:3:1101:4797:1397 1:N:0:NCTGCAAG+NAGGTGAA

    @E00593:415:HFLKYCCX2:3:1101:5629:1397 1:N:0:NCGGAATG+NAATGTAG

    @E00593:415:HFLKYCCX2:3:1101:5893:1397 1:N:0:NCCAACTG+NTATTGCC

    The first base of every barcode has been replaced with an 'N'. Every read is like that. The barcodes are otherwise correct, so theoretically, I could demultiplex the Undetermined files myself if I used a 1-bp mismatch.

    Is this normal? I can't work out if every barcode is supposed to begin with 'N' and this is some sort of wildcarding, or if there's been a mistake at the sequencing facility. Similar threads on seqanswers weren't helpful, so this is my first post as a longtime lurker. I'm not very familiar with raw FASTQ data, so I'm sorry if this is a dumb question, but I'm baffled.

    Any suggestions would be gratefully received. Thank you!

  • #2
    Why is your facility not doing the demultiplexing? This should be very simple for them to do.

    It is normal to have a few reads at the beginning of the file to have some N's but are you saying that your entire file has an N at the beginning of index sequences? If so this is not normal but not not unheard of either. As long as your indexes allow for enough edit distance you/preferably they should be able to demultiplex the data.

    Comment


    • #3
      The sequencing facility did do the demultiplexing. We got back a handful of reads (a few kilobytes of FASTQ files) for each sample, then two giant ~35 GB fastq.gz with undetermined reads. Meaning the vast majority were undetermined.

      Looking through those undetermined files, I see that the first base of every index sequence is a N. The remaining 7 bases do match up correctly with the indexes we submitted. It's not clear to me why every single index sequence has the first base replaced with an N.

      Also, it's making me wonder why they weren't able to demultiplex those reads. They're all effectively one base mismatched, so shouldn't they have demultiplexed just fine?

      Comment


      • #4
        Yes they should have demultiplexed after allowing for one error. If they did not then there may be another issue. It would be fair to go back to the facility and ask. If there was a sequencing issue which has led to the N's at position 1 then they should address that.

        There is some code here that will allow you to see all indexes (and read numbers) in your data. Run it on your undetermined file and see if there are other issues.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        18 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        22 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        17 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        49 views
        0 likes
        Last Post seqadmin  
        Working...
        X