Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Client sends wrong index. Run is done.

    We had a client send us a set of 43 multiplexed samples. Although the NextSeq500 run went well, only 54% of the reads passed filter (PF). We normally get in the 93%+ range, so we dug but couldn't find any explanation except that there must have been a bad index (typo) in the sample sheet. Not our error but hey, we want to help.

    Some of the samples had essentially zero reads after demultiplexing. We also found discarded reads where over a million had the same index.

    Question: Can we easily determine which samples need to be re-de-multiplexed? This must have happened before.

    Thanks,
    -pete
    Last edited by hoytpr; 10-24-2017, 02:49 PM. Reason: added instrument

  • #2
    Easiest way will be to ask client the kit they have used and re-demultiplex with a new sample sheet listing all indices available for the kit. Also, if it is a duel index it might be that index2 sequences has been entered in incorrect order (might require reverse complement of index 2 if has not been done already).

    Comment


    • #3
      Some of the samples had essentially zero reads after demultiplexing. We also found discarded reads where over a million had the same index.
      Possibility of an unbalanced pool/failed samples in addition to the index sequence errors. You can't fix the former but latter should be easily fixed by using correct indexes and redoing the demultiplexing as suggested by @nucacidhunter. You should always use the entire set for demultiplexing when using a corrected samplesheet.

      Comment


      • #4
        Thanks, seems so logical in retrospect. I appreciate the help and will set it up today.
        There are likely some failed samples also, according to the client. But hopefully we can get most of the 43 samples straightened out.
        -pete

        Comment


        • #5
          See the file /Reports/html/index.html in the run folder. It includes a list of top10 unknown barcodes (and the known ones). Click on "show barcodes" in the top-right corner when you have opened the html file.
          /Jakob

          Comment


          • #6
            Originally posted by JakobHedegaard View Post
            See the file /Reports/html/index.html in the run folder. It includes a list of top10 unknown barcodes (and the known ones). Click on "show barcodes" in the top-right corner when you have opened the html file.
            /Jakob
            Thanks Jakob, I did NOT see that link up at the top right, and had manually yanked those sequences out from the DemuxSummaryF1Ln.txt files.

            At least three and probably five of the samples have no or VERY few reads. Two might just be bad libraries. Unfortunately based on the percent of each base (A,G,C,T) at each of the twelve base read positions in the 43 index sequences, it looks like the sample sheet was mixed up in several places, and I can't substitute these unknown index reads into the index reads %A, %C, %G, %T to correct for the differences. It was a long shot but I can't figure out anything else to do. I hope they can figure out which are their samples from the assemblies.
            -pete

            Comment


            • #7
              FYI: I wanted to post back about an error we got running 384 samples. We'd done 224 before, but going over 250 might give you an error like:

              ERROR: bcl2fastq::common::Exception: 2017-Oct-25 17:44:17: Too many open files (24):

              The solution is found here:
              I was trying to demultiplex a MiSeq run with the Illumina utility `bcl2fastq` today and got the error “too many open files (24)”. As it turns out, if you have more than about 250 sample…


              -pete

              Comment


              • #8
                @pete: Out of curiosity is this an error purely on client's part? Is it only restricted to having wrong entries in samplesheet? Were samples pooled from multiple submitters (I don't understand what you wrote in post #6)?
                Last edited by GenoMax; 10-26-2017, 08:35 AM.

                Comment


                • #9
                  Originally posted by GenoMax View Post
                  @pete: Out of curiosity is this an error purely on client's part? Is it only restricted to having wrong entries in samplesheet? Were samples pooled from multiple submitters (I don't understand what you wrote in post #6)?
                  Seems to be entirely an error on the client's end. Apparently uses a lot of undergrads (which I fully support) but there apparently some students remarking about "problems" and "mistakes" the others were making. So only one submitter with 43 samples (a mix of genomic and mitochondrial DNA from what I understand). We didn't make the libraries. I believe the samples, or the indecies, or both, were mixed. Some samples had no reads.

                  As a shot in the dark, I looked in the SAV software, looked at all PF reads, but limited the output to the 12 reads (BIOO indecies) of "Read2" (the single index read cycle). The percent of As, Cs, Gs, and Ts for each read 146 through 157 (the twelve index reads) is shown. With 43 indecies, if the percentage of "A" at read 146 was for example 17.5%, then it suggests there were 8/43 "A". Didn't try to correct for phasing, but this was just me trying to learn. The same was done for C, G, T, and got 5 Cs, 11 Gs and 19 Ts, so in read 146 there were the expected 43 bases. Then I followed this through the rest of the reads, and ended up with a 4X12 matrix of how the As, Cs, Gs, and Ts should be distributed in the index reads 146 - 157 (worked surprisingly well). It doesn't tell me exactly which indecies were found when the machine was running, but it gives a base distribution approximation.

                  Then I tried to match this up with the index sequences as given from the Sample Sheet. I calculated the same 4x12 matrix distribution. They didn't match up very well suggesting (to me, and old molecular geneticist) the indecies in the sample sheet and the indecies in the run were different.

                  Then I took the top 25 "unknown" index sequences, and tried substituting them into the 4X12 matrix to see if they could get the Sample sheet index matrix to match up what the run data matrix. No joy.

                  This probably sounds stupid considering how phasing could have screwed up things, and it still wouldn't have told me which index went to which sample, but it might have determined which of the "unknown" indecies and associated reads were the correct ones. Like I said, just trying to help. If it's still not clear I can send you a spreadsheet.

                  -pete

                  Comment


                  • #10
                    Originally posted by GenoMax View Post
                    @pete: Out of curiosity is this an error purely on client's part? Is it only restricted to having wrong entries in samplesheet? Were samples pooled from multiple submitters (I don't understand what you wrote in post #6)?
                    I wrote a response earlier, but I was timed out and then the message must have gotten lost. I'll write another and try to cut/paste.
                    -pete

                    Comment


                    • #11
                      Originally posted by hoytpr View Post
                      I wrote a response earlier, but I was timed out and then the message must have gotten lost. I'll write another and try to cut/paste.
                      -pete
                      It is there. Needed moderation.

                      Comment


                      • #12
                        Such is the life of a core facility. At least the submitter can't blame you since they did everything.

                        I wonder if the two sets of libraries had distinct insert sizes and one set competed the other one out (people sometimes think they are being clever and try to save money).

                        As a last ditch effort you could just put Sample_1, Sample_2 against the indices you actually see and demux using that generic scheme. Submitter hopefully has some alternate means of figuring out what is what.

                        Comment


                        • #13
                          The poor pf and wrong indices are 2 different issues. Bad libraries shouldn't lead to poor pf unless you were loosing all of a certain type of libraries and ended up with just low diversity? Might check your thumbnails to see if you were overclustered (which can still be the users fault if their libraries are much small than expected).

                          I've attempted to figure out the correct indices when users have told me the wrong ones and generally haven't been able to do it. But I haven't tried that hard.
                          Microbial ecologist, running a sequencing core. I have lots of strong opinions on how to survey communities, pretty sure some are even correct.

                          Comment


                          • #14
                            Originally posted by GenoMax View Post
                            It is there. Needed moderation.
                            Thanks. Yes, I learned a lesson today. The run was only slightly overclustered and the 384-index run had 90% PF (all the bad PF were labeled default or unknown). Our PhiX was only 1.2%.

                            After the re-analysis with all 384 indecies, the 43 highest numbers of indexed sequences included 7 barcodes not on the sample sheet. There wasn't an obvious falloff of sample read numbers for another 6-7 indecies down the list. It's a mess.

                            Note to group: For reference, with 174,615,251 clusters PF, (if you factor in that approximately 43 indecies were supposed to be there), we had a total of 240,038 "bad" clusters or ~0.14 percent.
                            -pete

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Strategies for Sequencing Challenging Samples
                              by seqadmin


                              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                              03-22-2024, 06:39 AM
                            • seqadmin
                              Techniques and Challenges in Conservation Genomics
                              by seqadmin



                              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                              Avian Conservation
                              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                              03-08-2024, 10:41 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, Yesterday, 06:37 PM
                            0 responses
                            10 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, Yesterday, 06:07 PM
                            0 responses
                            9 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-22-2024, 10:03 AM
                            0 responses
                            50 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-21-2024, 07:32 AM
                            0 responses
                            67 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X