SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Help with demultiplexing inline dual barcodes on paired end Illumina sequences Kurt Lamour Bioinformatics 0 10-05-2015 11:41 AM
Recertified Illumina Hiseq 2500 IET Illumina/Solexa 0 11-26-2014 08:48 AM
Illumina Hiseq 2500 Available For Sale. kcacdna Illumina/Solexa 0 11-25-2013 08:07 PM
Can I load two DNA and two RNA samples to one HiSeq 2500 rapid mode flowcell? ymc Illumina/Solexa 7 09-12-2013 06:23 PM

Reply
 
Thread Tools
Old 03-30-2019, 04:53 PM   #1
skaloud
Junior Member
 
Location: Prague

Join Date: Mar 2019
Posts: 2
Default Samples artificially grouped by i5 inline barcodes, in Illumina HiSeq 2500

Hi,
I am using single digest RAD-Seq for analysing the population structure of a protist species, using the dual barcoding by i7 index barcodes and i5 inline barcodes. First two libraries consisting of 9 samples were sequenced on Illumina MiSeq (2x150 and 2x250), and the samples were biogeographically separated as expected. However, in two subsequent runs of libraries (80 samples) sequenced on Illumina HiSeq 2500 (2x100) the samples were artificially grouped by i5 barcodes.

I will be very happy for any help or hint, as I have no idea how this structure can be obtained. Of course, the indexes were removed prior analysing the data. The library preparation pipeline is the same for MiSeq and HiSeq libraries, with the sole difference in the number of samples pooled together. Concerning the sequencing, one difference I am aware of is in the number of cycles. MiSeq used three cycles (two read cycles, one index cycle), whereas HiSeq used four cycles (R1+R4 read cycles, R2 i7 index cycle, R3 i5 index cycle; note we do not have any i5 index barcode).

Attached is a file showing our results in detail. Thank you in advance for any feedback.

Pavel
Attached Files
File Type: pdf Rad-seq enigma.pdf (385.9 KB, 10 views)
skaloud is offline   Reply With Quote
Old 03-30-2019, 07:41 PM   #2
nucacidhunter
Jafar Jabbari
 
Location: Melbourne

Join Date: Jan 2013
Posts: 1,217
Default

It could be result of demultiplexing HiSeq runs. i5 index read need to be masked during demultiplexing otherwise the i5 read sequences will be added to 5' end of read 2 (R4).
nucacidhunter is offline   Reply With Quote
Old 03-31-2019, 01:57 PM   #3
skaloud
Junior Member
 
Location: Prague

Join Date: Mar 2019
Posts: 2
Default

Thank you for the feedback!
In fact, I already wondered about the possibility that i5 read might influence the data (in particular, after reading this: https://sequencing.qcfail.com/articl...uddle-samples/).
However, both runs were also analysed using the single reads only (R1 cycle), and the pattern persists.
skaloud is offline   Reply With Quote
Old 04-04-2019, 02:54 PM   #4
SNPsaurus
Registered Vendor
 
Location: Eugene, OR

Join Date: May 2013
Posts: 469
Default

When I read your post I wondered if index hopping would cause the problem, and your attachment says there is an enormous level of hopping. So wouldn't samples that have a shared P1 inline barcode be swapping reads as well, leading to all the samples sharing the inline barcode becoming more similar? This can drive a genetic structure signal if some loci are polymorphic present/absent, with the index hopping reads completely filling for the missing locus. Remember, if an "absent" index combo has 500,000 reads, then the "good" combo with 1.5M reads is probably 1M actual sample reads and 500k swapped reads. You wondered that swapping should swap across all samples, but the classic swap comes from the wrong index primer binding to a template and labelling that fragment as the wrong sample. The inline bcs cannot be misprimed this way, so any sample that has a P1 inline bc will remain in that group, but can pick up the wrong P2 index. This isn't great proof, but look at your table--P1 inline bc 1 has a sample with 25M reads (P2-4) and has empty combos with 1 to >2M reads. Inline bc 3 has all poorly sequenced samples, and the empty combos are just 200-300k reads. The rest play out the same way.

So the question to me is less how this clustering by inline barcode is happening...I think you answered that with your hopping analysis, but more why in the world is this hopping happening at such levels? Particularly since this is a HiSeq 2500...the index hopping came to attention with ExAmp Illumina sequencers that use a patterned flow cell. We see a few hundred reads in non-existent dual index combinations (nextRAD, not RAD-Seq) on a 2500, and thousands of reads on a 4000...but usually less than 1% of reads.

You have some samples with very high read counts (20M instead of 1-5M). Do the non-existent barcode combos group with those samples tightly within an inline bc group? I assume the hopping reads will be mostly from these dominant samples.

You say you checked that your P1 inline barcodes were not cross contaminated. But you'd get this if the P2s were cross contaminated. Did you check those? I guess the other possibility is that we first developed RAD-Seq we amplified pools of samples. How did you amplify your samples? In pools grouped by P1 inline bcs? Individually? How did you remove the P2 primers and when?
__________________
Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com
SNPsaurus is offline   Reply With Quote
Reply

Tags
illumina, indexing, rad-seq, structure

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:35 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO