Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Most HiSeq reads demultiplexing as 'Undetermined'; did seq facility screw up?

    We sent a library of multiplexed samples prepped with the KAPA HyperCap workflow for sequencing in a single lane of a HiSeq 4000. To our horror, the vast majority (>90%) came back as undetermined.

    All of our barcodes are 8 bp, dual index. Here are a few examples from our R1 undetermined FASTQ file:

    @E00593:415:HFLKYCCX2:3:1101:4797:1397 1:N:0:NCTGCAAG+NAGGTGAA

    @E00593:415:HFLKYCCX2:3:1101:5629:1397 1:N:0:NCGGAATG+NAATGTAG

    @E00593:415:HFLKYCCX2:3:1101:5893:1397 1:N:0:NCCAACTG+NTATTGCC

    The first base of every barcode has been replaced with an 'N'. Every read is like that. The barcodes are otherwise correct, so theoretically, I could demultiplex the Undetermined files myself if I used a 1-bp mismatch.

    Is this normal? I can't work out if every barcode is supposed to begin with 'N' and this is some sort of wildcarding, or if there's been a mistake at the sequencing facility. Similar threads on seqanswers weren't helpful, so this is my first post as a longtime lurker. I'm not very familiar with raw FASTQ data, so I'm sorry if this is a dumb question, but I'm baffled.

    Any suggestions would be gratefully received. Thank you!

  • #2
    Why is your facility not doing the demultiplexing? This should be very simple for them to do.

    It is normal to have a few reads at the beginning of the file to have some N's but are you saying that your entire file has an N at the beginning of index sequences? If so this is not normal but not not unheard of either. As long as your indexes allow for enough edit distance you/preferably they should be able to demultiplex the data.

    Comment


    • #3
      The sequencing facility did do the demultiplexing. We got back a handful of reads (a few kilobytes of FASTQ files) for each sample, then two giant ~35 GB fastq.gz with undetermined reads. Meaning the vast majority were undetermined.

      Looking through those undetermined files, I see that the first base of every index sequence is a N. The remaining 7 bases do match up correctly with the indexes we submitted. It's not clear to me why every single index sequence has the first base replaced with an N.

      Also, it's making me wonder why they weren't able to demultiplex those reads. They're all effectively one base mismatched, so shouldn't they have demultiplexed just fine?

      Comment


      • #4
        Yes they should have demultiplexed after allowing for one error. If they did not then there may be another issue. It would be fair to go back to the facility and ask. If there was a sequencing issue which has led to the N's at position 1 then they should address that.

        There is some code here that will allow you to see all indexes (and read numbers) in your data. Run it on your undetermined file and see if there are other issues.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        9 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        50 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        67 views
        0 likes
        Last Post seqadmin  
        Working...
        X