SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
Pooling TruSeq (single index) and Nextera XT (dual index) in one MiSeq run? Carosmile Sample Prep / Library Generation 1 05-16-2017 02:13 PM
Metrix - A Server / client parser for Illumina (InterOp) run directories Rhizosis Bioinformatics 29 10-15-2014 05:06 AM
wrong run parameters. How can I change it after 1 cycle? stasi Illumina/Solexa 4 10-13-2014 01:32 AM
454 run - wrong primer - can it be salvaged? Indiana 454 Pyrosequencing 7 10-31-2013 05:24 PM
Aborted BWA run - what is wrong? yaximik Bioinformatics 1 06-10-2013 03:52 AM

Reply
 
Thread Tools
Old 10-24-2017, 03:48 PM   #1
hoytpr
Member
 
Location: Stillwater

Join Date: Dec 2009
Posts: 35
Default Client sends wrong index. Run is done.

We had a client send us a set of 43 multiplexed samples. Although the NextSeq500 run went well, only 54% of the reads passed filter (PF). We normally get in the 93%+ range, so we dug but couldn't find any explanation except that there must have been a bad index (typo) in the sample sheet. Not our error but hey, we want to help.

Some of the samples had essentially zero reads after demultiplexing. We also found discarded reads where over a million had the same index.

Question: Can we easily determine which samples need to be re-de-multiplexed? This must have happened before.

Thanks,
-pete

Last edited by hoytpr; 10-24-2017 at 03:49 PM. Reason: added instrument
hoytpr is offline   Reply With Quote
Old 10-24-2017, 04:25 PM   #2
nucacidhunter
Senior Member
 
Location: Iran

Join Date: Jan 2013
Posts: 1,074
Default

Easiest way will be to ask client the kit they have used and re-demultiplex with a new sample sheet listing all indices available for the kit. Also, if it is a duel index it might be that index2 sequences has been entered in incorrect order (might require reverse complement of index 2 if has not been done already).
nucacidhunter is offline   Reply With Quote
Old 10-24-2017, 05:31 PM   #3
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,546
Default

Quote:
Some of the samples had essentially zero reads after demultiplexing. We also found discarded reads where over a million had the same index.
Possibility of an unbalanced pool/failed samples in addition to the index sequence errors. You can't fix the former but latter should be easily fixed by using correct indexes and redoing the demultiplexing as suggested by @nucacidhunter. You should always use the entire set for demultiplexing when using a corrected samplesheet.
GenoMax is offline   Reply With Quote
Old 10-25-2017, 07:05 AM   #4
hoytpr
Member
 
Location: Stillwater

Join Date: Dec 2009
Posts: 35
Default

Thanks, seems so logical in retrospect. I appreciate the help and will set it up today.
There are likely some failed samples also, according to the client. But hopefully we can get most of the 43 samples straightened out.
-pete
hoytpr is offline   Reply With Quote
Old 10-26-2017, 02:14 AM   #5
JakobHedegaard
Member
 
Location: Aarhus, Denmark

Join Date: Mar 2008
Posts: 39
Default

See the file /Reports/html/index.html in the run folder. It includes a list of top10 unknown barcodes (and the known ones). Click on "show barcodes" in the top-right corner when you have opened the html file.
/Jakob
JakobHedegaard is offline   Reply With Quote
Old 10-26-2017, 07:59 AM   #6
hoytpr
Member
 
Location: Stillwater

Join Date: Dec 2009
Posts: 35
Default

Quote:
Originally Posted by JakobHedegaard View Post
See the file /Reports/html/index.html in the run folder. It includes a list of top10 unknown barcodes (and the known ones). Click on "show barcodes" in the top-right corner when you have opened the html file.
/Jakob
Thanks Jakob, I did NOT see that link up at the top right, and had manually yanked those sequences out from the DemuxSummaryF1Ln.txt files.

At least three and probably five of the samples have no or VERY few reads. Two might just be bad libraries. Unfortunately based on the percent of each base (A,G,C,T) at each of the twelve base read positions in the 43 index sequences, it looks like the sample sheet was mixed up in several places, and I can't substitute these unknown index reads into the index reads %A, %C, %G, %T to correct for the differences. It was a long shot but I can't figure out anything else to do. I hope they can figure out which are their samples from the assemblies.
-pete
hoytpr is offline   Reply With Quote
Old 10-26-2017, 09:13 AM   #7
hoytpr
Member
 
Location: Stillwater

Join Date: Dec 2009
Posts: 35
Default

FYI: I wanted to post back about an error we got running 384 samples. We'd done 224 before, but going over 250 might give you an error like:

ERROR: bcl2fastq::common::Exception: 2017-Oct-25 17:44:17: Too many open files (24):

The solution is found here:
https://erikclarke.net/2016/03/31/op...and-bcl2fastq/

-pete
hoytpr is offline   Reply With Quote
Old 10-26-2017, 09:21 AM   #8
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,546
Default

@pete: Out of curiosity is this an error purely on client's part? Is it only restricted to having wrong entries in samplesheet? Were samples pooled from multiple submitters (I don't understand what you wrote in post #6)?

Last edited by GenoMax; 10-26-2017 at 09:35 AM.
GenoMax is offline   Reply With Quote
Old 10-26-2017, 10:38 AM   #9
hoytpr
Member
 
Location: Stillwater

Join Date: Dec 2009
Posts: 35
Default

Quote:
Originally Posted by GenoMax View Post
@pete: Out of curiosity is this an error purely on client's part? Is it only restricted to having wrong entries in samplesheet? Were samples pooled from multiple submitters (I don't understand what you wrote in post #6)?
Seems to be entirely an error on the client's end. Apparently uses a lot of undergrads (which I fully support) but there apparently some students remarking about "problems" and "mistakes" the others were making. So only one submitter with 43 samples (a mix of genomic and mitochondrial DNA from what I understand). We didn't make the libraries. I believe the samples, or the indecies, or both, were mixed. Some samples had no reads.

As a shot in the dark, I looked in the SAV software, looked at all PF reads, but limited the output to the 12 reads (BIOO indecies) of "Read2" (the single index read cycle). The percent of As, Cs, Gs, and Ts for each read 146 through 157 (the twelve index reads) is shown. With 43 indecies, if the percentage of "A" at read 146 was for example 17.5%, then it suggests there were 8/43 "A". Didn't try to correct for phasing, but this was just me trying to learn. The same was done for C, G, T, and got 5 Cs, 11 Gs and 19 Ts, so in read 146 there were the expected 43 bases. Then I followed this through the rest of the reads, and ended up with a 4X12 matrix of how the As, Cs, Gs, and Ts should be distributed in the index reads 146 - 157 (worked surprisingly well). It doesn't tell me exactly which indecies were found when the machine was running, but it gives a base distribution approximation.

Then I tried to match this up with the index sequences as given from the Sample Sheet. I calculated the same 4x12 matrix distribution. They didn't match up very well suggesting (to me, and old molecular geneticist) the indecies in the sample sheet and the indecies in the run were different.

Then I took the top 25 "unknown" index sequences, and tried substituting them into the 4X12 matrix to see if they could get the Sample sheet index matrix to match up what the run data matrix. No joy.

This probably sounds stupid considering how phasing could have screwed up things, and it still wouldn't have told me which index went to which sample, but it might have determined which of the "unknown" indecies and associated reads were the correct ones. Like I said, just trying to help. If it's still not clear I can send you a spreadsheet.

-pete
hoytpr is offline   Reply With Quote
Old 10-26-2017, 12:21 PM   #10
hoytpr
Member
 
Location: Stillwater

Join Date: Dec 2009
Posts: 35
Default

Quote:
Originally Posted by GenoMax View Post
@pete: Out of curiosity is this an error purely on client's part? Is it only restricted to having wrong entries in samplesheet? Were samples pooled from multiple submitters (I don't understand what you wrote in post #6)?
I wrote a response earlier, but I was timed out and then the message must have gotten lost. I'll write another and try to cut/paste.
-pete
hoytpr is offline   Reply With Quote
Old 10-26-2017, 12:26 PM   #11
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,546
Default

Quote:
Originally Posted by hoytpr View Post
I wrote a response earlier, but I was timed out and then the message must have gotten lost. I'll write another and try to cut/paste.
-pete
It is there. Needed moderation.
GenoMax is offline   Reply With Quote
Old 10-26-2017, 12:31 PM   #12
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,546
Default

Such is the life of a core facility. At least the submitter can't blame you since they did everything.

I wonder if the two sets of libraries had distinct insert sizes and one set competed the other one out (people sometimes think they are being clever and try to save money).

As a last ditch effort you could just put Sample_1, Sample_2 against the indices you actually see and demux using that generic scheme. Submitter hopefully has some alternate means of figuring out what is what.
GenoMax is offline   Reply With Quote
Old 10-26-2017, 12:53 PM   #13
thermophile
Senior Member
 
Location: CT

Join Date: Apr 2015
Posts: 201
Default

The poor pf and wrong indices are 2 different issues. Bad libraries shouldn't lead to poor pf unless you were loosing all of a certain type of libraries and ended up with just low diversity? Might check your thumbnails to see if you were overclustered (which can still be the users fault if their libraries are much small than expected).

I've attempted to figure out the correct indices when users have told me the wrong ones and generally haven't been able to do it. But I haven't tried that hard.
__________________
Microbial ecologist, running a sequencing core. I have lots of strong opinions on how to survey communities, pretty sure some are even correct.
thermophile is offline   Reply With Quote
Old 10-26-2017, 01:37 PM   #14
hoytpr
Member
 
Location: Stillwater

Join Date: Dec 2009
Posts: 35
Default

Quote:
Originally Posted by GenoMax View Post
It is there. Needed moderation.
Thanks. Yes, I learned a lesson today. The run was only slightly overclustered and the 384-index run had 90% PF (all the bad PF were labeled default or unknown). Our PhiX was only 1.2%.

After the re-analysis with all 384 indecies, the 43 highest numbers of indexed sequences included 7 barcodes not on the sample sheet. There wasn't an obvious falloff of sample read numbers for another 6-7 indecies down the list. It's a mess.

Note to group: For reference, with 174,615,251 clusters PF, (if you factor in that approximately 43 indecies were supposed to be there), we had a total of 240,038 "bad" clusters or ~0.14 percent.
-pete
hoytpr is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:21 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO