Seqanswers Leaderboard Ad

**pmiguel** · 01-26-2017, 05:44 AM

Which library construction kit was used? Some now include methods to add at each end of an insert some random sequence of a known length. Bioo, for instance, uses this to reduce ligation site bias. But that would produce different sequence at either end.
I think there are kits that add the same tag on both ends -- which could be used to eliminate chimeric clones. (Although I wouldn't think this would be a big issue for ChIP libraries...)

--
Phillip

**dpryan** · 01-26-2017, 05:46 AM

Some sort of NEB kit, from what I've been told. It's the same kit that's used to construct all of the other ChIPseq libraries, none of which have produced this sort of effect (either prior to this run or since).

**SylvainL** · 01-26-2017, 07:12 AM

Were the libraries prepared using tagmentation?

**dpryan** · 01-26-2017, 07:13 AM

No, this was your standard ChIPseq sort of library prep, no tagmentation.

**SylvainL** · 01-26-2017, 07:18 AM

just to be sure I understood: the upper case sequence is on the genome (meaning, it's really present in read1), while the problem concerns only the read2, so it's kind of inverted repeat on the genome, but you don't find this repeat on the genome.

**dpryan** · 01-26-2017, 07:20 AM

Originally posted by SylvainL View Post

just to be sure I understood: the upper case sequence is on the genome (meaning, it's really present in read1), while the problem concerns only the read2, so it's kind of inverted repeat on the genome, but you don't find this repeat on the genome.

Yes, exactly.

**SylvainL** · 01-26-2017, 07:28 AM

I will be interested by the explanation then

I thought it could be a tagmentation followed by a Klenow repair which would keep the transposae "signature", but even like that, you wouldn't expect to have exactly the same sequence of each pair...

**GenoMax** · 01-26-2017, 07:49 AM

Originally posted by dpryan View Post

This is happening in a rather large portion of the reads from multiple samples from the same group (all run on the same flow cell and together these samples occupied the entire flow cell).

Don't want to be a conspiracy theorist but perhaps there is an explanation hidden in whatever the group is doing to prep the samples. Since you are experienced on both sides of world perhaps talking with whoever made the preps/libraries may root a cause out.

Is this n=1 (even though for multiple samples) and/or a repeated observation across multiple runs? You could also make Illumina aware by submitting a ticket. Perhaps someone else has reported something to them before.

**dpryan** · 01-26-2017, 10:50 AM

Originally posted by GenoMax View Post

Don't want to be a conspiracy theorist but perhaps there is an explanation hidden in whatever the group is doing to prep the samples. Since you are experienced on both sides of world perhaps talking with whoever made the preps/libraries may root a cause out.

Is this n=1 (even though for multiple samples) and/or a repeated observation across multiple runs? You could also make Illumina aware by submitting a ticket. Perhaps someone else has reported something to them before.

Yeah, one of our guesses would be that something went weird when the group did its IP, but we'll have to wait until the post-doc who did that is back from vacation to ask. Having said that, I'm not even sure how one could get this to happen during an IP (granted, the post-docs do enjoy coming up with new and creative ways of causing problems...).

This was an n=1 occurrence, we've had a few other (unproblematic) projects from this particular post-doc (many many more from his lab).

**microgirl123** · 01-27-2017, 08:25 AM

If you Google your capitalized sequence, it comes up as a motif that matches "Pbx3(Homeobox)/GM12878-PBX3-ChIP-Seq/Homer." That means nothing to me, but maybe it does to you or someone else?

**dpryan** · 01-27-2017, 03:29 PM

The example is just random sequence that I typed in. In the real dataset, it varies by read. It's all mouse DNA and matches where ever read 1 aligns.

**nucacidhunter** · 01-27-2017, 04:54 PM

So far, information in this thread can be summarised as following:
1- Initial 8-15 sequences of Read2 in some pairs are identical to Read1
2- These sequences are from the genome as Read1 directly and Read2 after soft clipping perfectly maps to the reference and the distanced matches library insert sizes
3- It is not the results of bcl2fastq software settings

Possible explanations:
1- Sequences are present in the library fragments (not known)
2- Sequences were added during sequencing steps (not known)
3- Sequences were generated by RTA software (not known)
4- Sequences were generated by bcl2fastq (ruled out)

I would be interested to know the run set up (reads and index cycles). This seems unexplainable and I would suggest spiking (%5) couple of the libraries with the highest incident of this observation to a non-related library run to check data reproducibility.

**GenoMax** · 01-28-2017, 07:37 AM

I like the idea of spiking the problem libraries and re-sequencing with a random pool to verify the result.

Sanger sequencing to confirm presence of those bases?

**dpryan** · 01-28-2017, 12:26 PM

We all agreed to do a spike-in of the worst sample on an upcoming run. I'm curious to see what happens. I'll post back when I get some results.

Topics	Statistics	Last Post
The Role of Spliceosomes in RNA Splicing and Genome Evolution by seqadmin Started by seqadmin, Today, 07:03 AM	0 responses 10 views 0 likes	Last Post by seqadmin Today, 07:03 AM
A Closer Look at the Enigmatic Genomes of Oikopleura dioica by seqadmin Started by seqadmin, 05-10-2024, 06:35 AM	0 responses 31 views 0 likes	Last Post by seqadmin 05-10-2024, 06:35 AM
Advanced Epigenome Editing Platform Explores Gene Regulation Mechanisms by seqadmin Started by seqadmin, 05-09-2024, 02:46 PM	0 responses 41 views 0 likes	Last Post by seqadmin 05-09-2024, 02:46 PM
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, 05-07-2024, 06:57 AM	0 responses 33 views 0 likes	Last Post by seqadmin 05-07-2024, 06:57 AM

Seqanswers Leaderboard Ad

Announcement

Intermittent inclusion of the beginning of read 1 in read 2

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News