Seqanswers Leaderboard Ad

**nucacidhunter** · 08-03-2014, 07:54 PM

Would you know whether the libraries were stranded or non-stranded and also the kit that has been used for library prep.

**evt8** · 08-03-2014, 08:21 PM

Sorry, libraries will have been stranded, prepared with a TruSeq Stranded mRNA LT Sample Prep Kit.

**nucacidhunter** · 08-03-2014, 08:43 PM

Would you know which indices were used for barcoding samples and which libraries pooled for sequencing in the same lane if more than one lane used for sequencing. If multiple lanes were used, were they sequenced in the same flow cell or different one? Is this biase in all samples or a pool of samples in one particular lane?

**evt8** · 08-04-2014, 01:32 PM

Yes, all samples were run on one lane and the bias occurs across all samples - but only for 2nd pair reads. The most common bias is by far 'AT', but 'TT' and 'GA' occur for some samples as well. I have pasted the index list used for barcoding samples below.
ATCACG
TTAGGC
ACTTGA
GATCAG
TAGCTT
GGCTAC
GTGGCC
GTTTCG
CGATGT
TGACCA
ACAGTG
GCCAAT
CAGATC
CTTGTA
AGTCAA
AGTTCC
CGTACG
GAGTGG
ACTGAT
ATTCCT

**nucacidhunter** · 08-04-2014, 04:12 PM

Thanks for providing enough information to explorer the likely causes of this observation. I do not see any biological or technical reason for this:

1) Bias is library specific and not observed in replicates of the same sample
2) There is no biochemical reason for such extreme bias that could be attributed to reactions or kit used during library prep and I have never seen this. If it was kit specific, you would see the same bias in all or most of samples.

That leaves me to think that these sequences are bases from index reads which somehow during demultiplexing has been added to the start of R2. This can happen as following:

1) Your libraries were run in a flow cell where other libraries had dual index or 8 base index. So, they had to do 8 cycles for index reads in all lanes in the flow cell.
2) LT indices are 6 base but are sequenced for 7 cycles. LT indices will be read normally up to 7th bases and for 8 base long index will be read all the way to 8th base below (I do not know why they have used 20 different indices for 10 libraries?):

ATCACGAT
TTAGGCAT
ACTTGAAT
GATCAGAT
TAGCTTAT
GGCTACAT
GTGGCCTT
GTTTCGGT
CGATGTAT
TGACCAAT
ACAGTGAT
GCCAATAT
CAGATCAT
CTTGTAAT
AGTCAACT
AGTTCCGT
CGTACGTT
GAGTGGAT
ACTGATAT
ATTCCTTT

If during demultiplexing the final two bases from index reads (if they did 8 cycles for index) were added to your read2 start position, you will see AT, GT, CT or TT depending on the index used for that particular library. These will not explain the GA bias (are sure about this?). You may look for a similar explanation for those biases if other bases from index reads has been added to the start of R2, for example, position 6 and 7 will result in addition of GA, CA, AA, TA, CT, GG, AC, CG, GT and TT and so on.

**GenoMax** · 08-04-2014, 04:36 PM

@nucacidhunter: I am scratching my head to imagine how #2 is possible. Since illumina index reads (1D or 2D) should have been read as reads independent of real sequence there is no way for CASAVA/bcl2fastq to add them to beginning of R2.

@evt8: Can you ask the sequence provider if this happened to other lanes on the flowcell?

**kcchan** · 08-04-2014, 05:41 PM

Originally posted by GenoMax View Post

@nucacidhunter: I am scratching my head to imagine how #2 is possible. Since illumina index reads (1D or 2D) should have been read as reads independent of real sequence there is no way for CASAVA/bcl2fastq to add them to beginning of R2.

@evt8: Can you ask the sequence provider if this happened to other lanes on the flowcell?

It's easy to get this screwed up if you have the wrong settings in BCL2FASTQ and ignore the errors that pop up. Most likely the run was sequenced with 8 index cycles yet during the demux they only used 6. So instead of --use-bases-mask y*,i6n*,y* they did y*,i6,y*. What that ended up doing was combining the remaining index reads with the start of R2.

**evt8** · 08-10-2014, 01:55 PM

I have now resolved the issue with our sequencing provider, although unfortunately not with a full explanation. They simply indicated there was a 'Bcl2fastq bug', and fixed the issue by relaunching the program. No other details were given.
However, I suspect the answers received in this thread (which I relayed to our provider) hit on the problem, so thank you all for your insights!

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 37 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 41 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 35 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Extreme 5' nucleotide bias in 2nd pair Illumina Hiseq reads

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News