Unconfigured Ad

**GenoMax** · 04-22-2020, 12:38 PM

Yikes this is a really low diversity sample. Do you know how much phiX (if any) was added to this sample. Did you not tell the sequence provider that these were low diversity? If you did not then it would be hard to make a case for them to re-sequence this sample again for free. You may have to pay for a re-run with a significant % of phiX (10-20% or more), if you want to get improved Q-scores.

It is possible that in spite of the bad Q-scores etc your sequence may still be usable. Have you looked at that?

**invu** · 04-22-2020, 02:29 PM

Originally posted by GenoMax View Post

Yikes this is a really low diversity sample. Do you know how much phiX (if any) was added to this sample. Did you not tell the sequence provider that these were low diversity? If you did not then it would be hard to make a case for them to re-sequence this sample again for free. You may have to pay for a re-run with a significant % of phiX (10-20% or more), if you want to get improved Q-scores.

It is possible that in spite of the bad Q-scores etc your sequence may still be usable. Have you looked at that?

Thanks for your reply, GenoMax!
The sample is a custom set of sequences with well-defined regions (hence those low-diversity regions). I had declined PhiX spike-in to obtain as many valid read lines as possible w/o sacrificing any to PhiX. I hadn't told them about the diversity because I had no idea about this kind of issue before; that being said, my old results for samples similar to this (even though they did have a few degenerate bases at the beginning) didn't have this problem (at least weren't as bad as this). Will I really need PhiX if I get to repeat something like this? Which way will I lose more data -- 10-20% loss by PhiX or less well-defined loss by poor quality reads like this?

I am looking at the data, and a big portion of the lines do seem valid and usable, but again, I'd need more lines to be ideal, and more importantly, even among those lines that apparently look okay, if more base call errors were caused by this issue, then that's a separate problem, which is quite hard to tell just from looking at those other lines.

Do you happen to know if someone looks at the rawer data (e.g., imaging data? if they're preserved? sorry I'm not really familiar with the details of the seq machines..) whether they could correct or improve the base calls throughout the seq data even now? Or is everything done real time by the machine and there's nothing that can be done to improve this?
Also, do you know if this issue caused by low diversity would also cause the tile-dependent quality loss as shown in my diagram? (This is something I am having hard time in understanding, and something I'm trying to argue about..)

**GenoMax** · 04-22-2020, 06:09 PM

You really should have asked for phiX to be added. You should consider the fact that this run could have completely failed, if it was a bit overloaded, leaving you with no data. Raw image data is generally not stored now-a-days so there is not much you can do with it afterwards. If you need more data consider sequencing an additional lane rather than taking a chance like this.

**invu** · 04-22-2020, 06:20 PM

Originally posted by GenoMax View Post

You really should have asked for phiX to be added. You should consider the fact that this run could have completely failed, if it was a bit overloaded, leaving you with no data. Raw image data is generally not stored now-a-days so there is not much you can do with it afterwards. If you need more data consider sequencing an additional lane rather than taking a chance like this.

Ha, I see. Lesson learned. Thanks for your help, GenoMax!

**ATϟGC** · 04-23-2020, 04:53 AM

If these are amplicon libraries and you want to minimize the amount of PhiX you can add "stagger" or "offset" nucleotides between the illumina sequencing primer region (like the nextera or truseq tail) and your locus-specific primer in order to create diversity of bases. These stagger nucleotides can also be added to restriction-digests adapters to increase base diversity.

I always add staggers to my amplicon primers and sequence multiple amplicons per run to increase diversity but I still always add 5-12% Phix just to be sure.

**invu** · 04-23-2020, 05:33 AM

Originally posted by ATϟGC View Post

If these are amplicon libraries and you want to minimize the amount of PhiX you can add "stagger" or "offset" nucleotides between the illumina sequencing primer region (like the nextera or truseq tail) and your locus-specific primer in order to create diversity of bases. These stagger nucleotides can also be added to restriction-digests adapters to increase base diversity.

I always add staggers to my amplicon primers and sequence multiple amplicons per run to increase diversity but I still always add 5-12% Phix just to be sure.

Thanks, ATϟGC, that's a good suggestion.
Looking back, the adapter-primers that I had used for my older runs when I didn't have this issue, did have some degenerate bases in between for different purposes and I think that was key in preventing this issue.

Still adding a minimal portion of PhiX is a good suggestion, too.
Thanks!!

**cement_head** · 04-23-2020, 08:46 AM

Originally posted by invu View Post

Thanks for your reply, GenoMax!
The sample is a custom set of sequences with well-defined regions (hence those low-diversity regions). I had declined PhiX spike-in to obtain as many valid read lines as possible w/o sacrificing any to PhiX. I hadn't told them about the diversity because I had no idea about this kind of issue before; that being said, my old results for samples similar to this (even though they did have a few degenerate bases at the beginning) didn't have this problem (at least weren't as bad as this). Will I really need PhiX if I get to repeat something like this? Which way will I lose more data -- 10-20% loss by PhiX or less well-defined loss by poor quality reads like this?

I am looking at the data, and a big portion of the lines do seem valid and usable, but again, I'd need more lines to be ideal, and more importantly, even among those lines that apparently look okay, if more base call errors were caused by this issue, then that's a separate problem, which is quite hard to tell just from looking at those other lines.

Do you happen to know if someone looks at the rawer data (e.g., imaging data? if they're preserved? sorry I'm not really familiar with the details of the seq machines..) whether they could correct or improve the base calls throughout the seq data even now? Or is everything done real time by the machine and there's nothing that can be done to improve this?
Also, do you know if this issue caused by low diversity would also cause the tile-dependent quality loss as shown in my diagram? (This is something I am having hard time in understanding, and something I'm trying to argue about..)

I'd have to agree with GenoMax; super-important to have a consultation with the sequencing center about the library composition and ask them what they recommend. You probably should have had 10% PhiX spike-in added. HiSeq are terrible at dynamic calibration - MiSeqs are better (to a point).

**invu** · 04-23-2020, 09:30 AM

Originally posted by cement_head View Post

I'd have to agree with GenoMax; super-important to have a consultation with the sequencing center about the library composition and ask them what they recommend. You probably should have had 10% PhiX spike-in added. HiSeq are terrible at dynamic calibration - MiSeqs are better (to a point).

I see. Next time I will consider PhiX spike-in. Thanks, cement_head!

**ATϟGC** · 04-24-2020, 05:05 AM

I agree that would be best to discuss these issues with your sequencing provider.

If you do choose to use staggered bases I recommend making an alignment to check for base diversity in the first 12-20 base pairs of read1. This alignment should be made with respect to the Illumina sequencing primer. For my amplicon libraries, this means I anchor it on the left by the Nextera Read1 sequences. You then only need to consider the base diversity of your staggered and/or unstaggered (I use a mix of both in my round 1 PCR reactions) primers or adapters. I do this in microsoft excel so that I can calculate and optimize base diversity of all the amplicons that will be pooled in my run.

Adding stagger bases has the potential to introduce biases in your libraries due to secondary structures or other priming phenomena. If you use the same mix of staggers for all samples the bias should be the same in theory.

I have only sequenced amplicons on Miseq and Novaseq and 5-12% PhiX has been enough for me with those platforms so I cannot comment on Hiseq.

Topics	Statistics	Last Post
Study Captures the First Moments of DNA Replication by SEQadmin2 Started by SEQadmin2, Today, 12:17 PM	0 responses 10 views 0 reactions	Last Post by SEQadmin2 Today, 12:17 PM
Chemotherapy Leaves Detectable DNA Signatures in Childhood Tumors by SEQadmin2 Started by SEQadmin2, Yesterday, 11:41 AM	0 responses 11 views 0 reactions	Last Post by SEQadmin2 Yesterday, 11:41 AM
Single-Cell Atlases Skew Toward European Ancestry, Analysis Finds by SEQadmin2 Started by SEQadmin2, 07-20-2026, 11:10 AM	0 responses 23 views 0 reactions	Last Post by SEQadmin2 07-20-2026, 11:10 AM
UC San Diego Bioengineers Map Gene Function in Human Stem Cells by SEQadmin2 Started by SEQadmin2, 07-13-2026, 10:26 AM	0 responses 37 views 0 reactions	Last Post by SEQadmin2 07-13-2026, 10:26 AM

Unconfigured Ad

Poor seq quality due to low diversity sample

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News