Seqanswers Leaderboard Ad

**GenoMax** · 02-18-2013, 09:07 AM

Phillip,

Are these nextera runs? How many samples were multiplexed and what is the average insert size? Are the two reads expected to overlap? Can you post example FastQC quality profile plots for one sample for the two variation of the runs you have posted above?

We have some really difficult multiplex samples that have major quality value issues (which I suspect are artificial) in spite of using all three workarounds you have listed above. We are continuing to work with Illumina actively.

BTW: Even for the first run the data are well within the published illumina spec of >75% data at Q30 for a 2 x 250 bp run (except for read 1).

**pmiguel** · 02-18-2013, 09:51 AM

Originally posted by GenoMax View Post

Phillip,

Are these nextera runs?

They are V3/V4 loop amplicons indexed using TruSeq Custom Amplicon (TCSA) style (sequence) dual indexes. The customer makes a "fusion primer" combining their locus and the proximal TruSeq adapter up to where the index is. They then reamplify with index containing primers that overlap that proximal TruSeq adapter sequence, but not the locus specific primer. Then they purify and pool their sample before passing them to us. We usually do an additional Ampure clean-up to shake loose a little more of the primer dimers.

Originally posted by GenoMax View Post

How many samples were multiplexed and what is the average insert size?

In this case, all 96 TSCA index pairs, plus 3 single index TruSeq samples that carry the genomic libraries ("ballast") to increase sequence diversity. Insert size was about 400-450 bp.

Originally posted by GenoMax View Post

Are the two reads expected to overlap?

Yes. The idea is to merge them so they can be run through a 454 QIIME pipeline by the customer.

Originally posted by GenoMax View Post

Can you post example FastQC quality profile plots for one sample for the two variation of the runs you have posted above?

I can after those get generated for the hard coded run.

Originally posted by GenoMax View Post

We have some really difficult multiplex samples that have major quality value issues (which I suspect are artificial) in spite of using all three workarounds you have listed above. We are continuing to work with Illumina actively.

BTW: Even for the first run the data are well within the published illumina spec of >75% data at Q30 for a 2 x 250 bp run (except for read 1).

Yes, I felt the run almost made it without requiring hard coding. Also the PANDA merging results looked fine. I would think it was just a QV assignment issue problem, but I would not expect that to effect the Error rate as depicted by SAV

--
Phillip

**genbio64** · 02-18-2013, 10:09 AM

Phillip,
Could you post a quick diagram of that indexing method please?

**pmiguel** · 02-18-2013, 11:59 AM

Originally posted by genbio64 View Post

Phillip,
Could you post a quick diagram of that indexing method please?

The arrow is a cartoon of the TCSA left adapter. The orange box denotes the 8 base "i5" index. The green box, as labelled, is some locus specific sequence. The 1st PCR primer fuses the locus-specific sequence with 33 bases of the proximate end of the TCSA adapter. The 2nd PCR primer overlaps the first by 20 bases.

You would also need the right adapter oligos. Basically the same design but with slightly different lengths.

For 96 indexes, you would want 8 i5 indexs and 12 i7 indexes. For 384 you would want 16 and 24, respectively.

I actually screwed up on the right-side oligos and included the reverse complements of the TCSA i7 indexes. But as long as one puts the right sequences in the sample sheet, everything works out okay.

--
Phillip

**bstamps** · 02-18-2013, 02:37 PM

I'll say that we've had good success having 50% genomic DNA of an organism we needed sequenced anyway/felt like getting data on, and having our amplicon library with a 12bp random barcode on the front end. We had at least 92 libraries on our run- it didn't make much sense to have less than that for the cost (pre-cluster all the libraries in house, and hand over an "amplicon" tube that the center could prep as usual). Our Forward read was great, with issues on the reverse. We're working around that now (Double barcoding, or something else we're going to try, and perhaps publish on if it works)- either way, we had enough data from the forward to move ahead. These are 16s rDNA libraries by the way- primers from the ARB group's recent publication on designing better universal primers.

**pmiguel** · 02-19-2013, 04:14 AM

Yes, we usually have the problems with the second read. In fact, this was first time I had seen a problematic 1st read but good 4th read.

--
Phillip

**GenoMax** · 02-19-2013, 04:25 AM

Phillip: Curious to see if the quality patterns changed at all between the two runs for a specific sample. You were going to post quality plots.

I think the new version MCS v.2.1.1.13 has done the most so far to improve the qualities (along with the new batch of kits which are performing well) but we are not there yet.

**pmiguel** · 02-19-2013, 07:47 AM

Originally posted by GenoMax View Post

Phillip: Curious to see if the quality patterns changed at all between the two runs for a specific sample. You were going to post quality plots.

I think the new version MCS v.2.1.1.13 has done the most so far to improve the qualities (along with the new batch of kits which are performing well) but we are not there yet.

Sorry that is going to take a while longer. Our servers are completely hammered at the moment with a hiseq run that just came off and fastqc was hanging so Rick had to kill off those processes.

Do you usually see differences between fastqc's assessment of the quality of a run and SAV's? I posted the SAVs quality heat map.

--
Phillip

**GenoMax** · 02-19-2013, 08:34 AM

Originally posted by pmiguel View Post

Sorry that is going to take a while longer.

No Problem.

Originally posted by pmiguel View Post

Do you usually see differences between fastqc's assessment of the quality of a run and SAV's? I posted the SAVs quality heat map.

--
Phillip

SAV shows an average representation of the values for all samples. I am interested to see if the actual quality values changed from one run to the other for individual sample(s). If you can pick a sample that had a overall low mean Q-value (based on the demultiplex summary report). OTH, you may not have any, if all your pooled samples look more or less the same.

**BBthekid007** · 02-20-2013, 09:34 AM

We've been sequencing recombined human antibody genes, which are pretty low diversity, especially at the start of both paired reads. In our case, it's especially critical that we get good quality for most of the read length. The amplicons are about 400bp in length, and we must be able to merge the forward/reverse reads into a single amplicon -- unmerged reads are essentially useless.

We've had the same sort of low-diversity issues that the 16S folks have had, but came up with a different solution. We mostly use off-site sequencing providers, so we wanted our method to be dependent on sample prep as much as possible, to allow us flexibility in selecting providers (some were unwilling to perform the 'hard-core' hack mentioned above). What we did was "offset" the reads by inserting varying numbers of N's between the sequencing primer and the gene-specific amplification primer. It turns out that multiples of 2 N's works best (-NN-, -NNNN-, -NNNNNN-, etc). Not sure why, but my guess is that adjacent clusters that are offset by only a single position can mess with phasing/prephasing calculations. Of course, this method entails making your own fusion primers, but that's something we were willing to do. In combination with other fairly common low-diversity techniques (high PhiX spike-in, lower cluster density), this approach has worked very well.

Here's what the Qscores look like without the offset primers:

And with the offset primers:

**Vinz** · 02-21-2013, 12:40 AM

Phillip, thanks for your post. Do I understand correctly, that you used 50% phiX? That would confirm our observation that phiX spiking is of limited effect with the v2 kits.

When not using hardcoded phasing we see pretty consistently what you are showing: read4 somehow is better than read1. This seems to be connected to the prephasing value. For some unknown reason, prephasing is calculated very high for the forward read and low for the reverse read.
2 non hardcoded examples with about 6% phiX spike and amplicons (12 different ones)

Attached Files

**Vinz** · 02-21-2013, 12:46 AM

When using hardcoded matrix/phasing we get Q30 success rates of above 75%, usually above 80%.
In contrast to what Illumina is saying we see no positive effect of:
- spiking more than 10% phiX
- reducing cluster density (700 to 1000 seems to be fine)

Attached Files

**pmiguel** · 02-21-2013, 05:10 AM

Originally posted by BBthekid007 View Post

We've been sequencing recombined human antibody genes, which are pretty low diversity, especially at the start of both paired reads. In our case, it's especially critical that we get good quality for most of the read length. The amplicons are about 400bp in length, and we must be able to merge the forward/reverse reads into a single amplicon -- unmerged reads are essentially useless.

We've had the same sort of low-diversity issues that the 16S folks have had, but came up with a different solution. We mostly use off-site sequencing providers, so we wanted our method to be dependent on sample prep as much as possible, to allow us flexibility in selecting providers (some were unwilling to perform the 'hard-core' hack mentioned above). What we did was "offset" the reads by inserting varying numbers of N's between the sequencing primer and the gene-specific amplification primer. It turns out that multiples of 2 N's works best (-NN-, -NNNN-, -NNNNNN-, etc). Not sure why, but my guess is that adjacent clusters that are offset by only a single position can mess with phasing/prephasing calculations. Of course, this method entails making your own fusion primers, but that's something we were willing to do. In combination with other fairly common low-diversity techniques (high PhiX spike-in, lower cluster density), this approach has worked very well.

Here's what the Qscores look like without the offset primers:

And with the offset primers:

Yes, your libraries then become effectively diverse by your systematically offsetting them. That is one of the methods Illumina wants you to use.

If I were making the libraries myself, I would probably employ a method something like that. But, although it is simple enough to understand if you are intimately familiar with this aspect of Illumina instruments, I just feel like I am making the world a worse place to live in every time I try to explain this stuff to a customer. Things are complex enough without added strange work-arounds to avoid bugs in an instrument system design.

The real solution needs to come from Illumina, but they aren't going to bother doing it unless they get enough complaints.

--
Phillip

**pmiguel** · 02-21-2013, 05:23 AM

Originally posted by Vinz View Post

Phillip, thanks for your post. Do I understand correctly, that you used 50% phiX? That would confirm our observation that phiX spiking is of limited effect with the v2 kits.

When not using hardcoded phasing we see pretty consistently what you are showing: read4 somehow is better than read1. This seems to be connected to the prephasing value. For some unknown reason, prephasing is calculated very high for the forward read and low for the reverse read.
2 non hardcoded examples with about 6% phiX spike and amplicons (12 different ones)

Sort of. I don't like to waste sequencing capacity on phiX, so I allow the customers to give us some genomic DNA they want sequenced and construct library(ies) from that.

We have a lot of "worst case" single amplicon projects, so I think we will continue spiking in 50% ballast libraries to help even those out. Also we will use hard coding.

Question: are your amplicons short enough to overlap the reads? For the run we describe above, the amplicons have 450 bp inserts. So for a paired read merge (Rick uses PANDA, but seems like most people use FLASH), one would expect to need high quality sequence over the entire length of both reads to effect a good merge. However, mysteriously, we had very high rates of successful merges even though the quality drops very low past 180 bases for read 1.

This could be simple a case of the instrument mistakenly assigning low quality values while correctly assigning the base calls. However, as you can see from the graphs above, the phiX-calculated error rates become very high at the point where the quality values become low. My understanding is that these were empirically determined error rates. That is, that RTA actually aligns the reads to phiX and calculates the error rate from disagreements between the alignment at a particular base.

What do you think? Is RTA actually "cheating" and just using quality values to assign the error rate? Something else?

Are you able to merge your forwards/reverse reads?

--
Phillip

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Today, 08:47 AM	0 responses 12 views 0 likes	Last Post by seqadmin Today, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

Sequencing low diversity samples on the MiSeq

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News