Seqanswers Leaderboard Ad

**kmcarr** · 03-22-2013, 08:30 AM

What species are you working with and what is the normal GC content of that species' genome? Your Sequence content plot looks perfectly normal if the genome of your species of interest has a GC content of 40%, lots of species do.

Regarding the Sequence duplication plot that may be entirely expected as well. You are doing a ChIP-Seq experiment. How many total sequences did you generate? How big is the genome of your organism? How big is the total target size of your ChIP enrichment? This plot may simply indicate that the target size you were enriching for is not that large and your ChIP enrichment worked very well. If you sequenced very deeply (e.g. 200 million reads) on such a small target you are inevitably going to get a lot of duplicate reads.

These plots can not be properly interpreted without a more thorough understanding of the biology of your system and what steps were carried out to generate your sequence data.

**Tobikenobi** · 03-24-2013, 04:09 PM

[QUOTE=kmcarr;99728]What species are you working with and what is the normal GC content of that species' genome? Your Sequence content plot looks perfectly normal if the genome of your species of interest has a GC content of 40%, lots of species do.

I am working in mouse.

Regarding the Sequence duplication plot that may be entirely expected as well. You are doing a ChIP-Seq experiment. How many total sequences did you generate? How big is the genome of your organism?

The total genome size should be 2,644,093,988 bases. The total number of reads obtained for the data I posted previously is 30,223,517.

How big is the total target size of your ChIP enrichment?
This plot may simply indicate that the target size you were enriching for is not that large and your ChIP enrichment worked very well. If you sequenced very deeply (e.g. 200 million reads) on such a small target you are inevitably going to get a lot of duplicate reads.

If by target size you are referring to the size the chromatin was fragmented to, then the answer is around 150 bp.

These plots can not be properly interpreted without a more thorough understanding of the biology of your system and what steps were carried out to generate your sequence data.

In this ChIP-Seq experiment, I used a bioruptor to shear the chromatin of mouse neural stem cells to a size of 150 bp after crosslinking. From the following immunoprecipitation I aimed at 2 biological replicates with a yield of 5 nanogramms of ds DNA as determined by picogreen assay. This DNA as well as input and IgG control went into a Illumina TruSeq ChIP Sample prep Kit and were then evenly pooled into a 4-plex library for sequencing on a HiSeq2000, single end on one lane of a HiSeq flow cell. The Yield from all libraries was between 1,500 and 2,100 Mbases with 30,000,000 to 42,000,000 reads.
I am currently unsure about how the starting DNA was treated in terms of PCR conditions, as this and library prep was carried out by a commercial service, but I am about to find out.

Thank you very much again for your help!

Tobias

**kmcarr** · 03-25-2013, 06:31 AM

Originally posted by Tobikenobi View Post

Originally posted by kmcarr View Post

What species are you working with and what is the normal GC content of that species' genome? Your Sequence content plot looks perfectly normal if the genome of your species of interest has a GC content of 40%, lots of species do.

I am working in mouse.

The %GC of the mouse genome is 41-42% so your base composition plot looks exactly like you would expect it to look.

How big is the total target size of your ChIP enrichment?
This plot may simply indicate that the target size you were enriching for is not that large and your ChIP enrichment worked very well. If you sequenced very deeply (e.g. 200 million reads) on such a small target you are inevitably going to get a lot of duplicate reads.

If by target size you are referring to the size the chromatin was fragmented to, then the answer is around 150 bp.

No, the fragment size is not what I was referring to. When I say target size I mean how many, and what it the total length of the regions targeted by your transcription factor. That is the target of your enrichment in this ChIP experiment. Is it a general transcription factor or one that is highly specific to a relatively small number of promoters? As a mental exercise let's imagine that your transcription factor targets 1,000 genes and the binding site size is ~100bp. This means that your target size is 100,000bp of DNA.

Now your input was a mouse genome, 2.64 Gbp of DNA. You obtained approximately 1.54 Gbp of DNA sequence data or < 1X coverage. In an unenriched sample the probability of duplicate reads would be close to 0. Honestly I am not that familiar with the normal statistics of ChIP enrichment but it seems to me that your enrichment would have to be off the charts fantastic to be able to see the level of duplication your are seeing explained by enrichment efficiency alone. I would start to worry that at one point during the ChIP process you ended up with an extremely limiting amount of DNA and subsequent PCR produced a biased, low diversity sample.

Have you tried mapping the reads to the mouse genome yet to see where they align?

**Tobikenobi** · 03-25-2013, 04:21 PM

Dear kmcarr,

thank you very much for your help.
In fact, I am not looking at an general TF but a rather specific one. People have done FLAG-CHIP-Seq on the factor on human cells and have identified about 5,500 genes to be targeted. So enrichment using this figure would mean about 500,000 bp, I guess.

I need to apologize, I should have probably attached the duplication level for my input control as well. This should not be enriched in any way, right? Even though the graph looks different, the duplication level is >80% here as well.

I have mapped the reads using bowtie and tried to look at them in the UCSC browser. In case of low complexity I should see regions that have high numbers of aligned reads vs regions that have low or no aligned reads?

Thank you again for your help!
Tobias

Attached Files

Input duplication_levels.jpg (59.3 KB, 54 views)

**kmcarr** · 03-26-2013, 04:39 AM

Originally posted by Tobikenobi View Post

I need to apologize, I should have probably attached the duplication level for my input control as well. This should not be enriched in any way, right? Even though the graph looks different, the duplication level is >80% here as well.

For the input control did you simply sequence some of the starting material, after fragmentation but before any immunoprecipitation? You are saying that this image is NOT from a no antibody IP control?

If that is the case then there is something significantly wrong with your input DNA. If your are sequencing random mouse genomic DNA and only collecting ~1.5 Gbp of sequence data (< 1X coverage of the genome) there is no way you should be observing read duplication like that. Did you start out with an extremely limiting amount of input DNA, because that can lead to a low diversity library. If you started with an adequate amount of genomic DNA then something went wrong with the library prep which drastically reduced the diversity of your sample.

**Tobikenobi** · 03-26-2013, 04:32 PM

This is in fact the input control, i.e. fragmented chromatin that was put aside before the immunoprecipitation.
The amount of starting material was indeed limiting in this experiment, as a specific type of neural stem cell was targeted. After discussing with the service facility that provides the library construction and sequencing at our institute, it was agreed to aim at 5 nanograms of immunoprecipitated, double stranded DNA as starting material to be sufficient with the TruSeq ChiP-Seq Kit. I assume that for the input a similar amount was used. The yield for the Input control was 1.9 Gbp and 38 million reads.

I guess the bottomline is that I am looking at libraries with very poor complexity. How could that reflect on later peak calling?

In the meantime I have used bowtie to map the reads to the mm9 reference and filtered for duplicates. I received the following numbers:

Input: 30,512,219 mapped reads (80%)
IP: 20,586,367 mapped reads (68%)

Thank you very much for your Input!

Tobias

**kmcarr** · 03-27-2013, 06:07 AM

Originally posted by Tobikenobi View Post

I guess the bottomline is that I am looking at libraries with very poor complexity. How could that reflect on later peak calling?

Clearly the input control doesn't represent the true background (the whole mouse genome), further you can not know that the bias in amplifying your IP sample was the same as the bias during amplification of the control. Given these results I would be skeptical about the validity of any "peaks" observed in your IP sample.

**Tobikenobi** · 03-28-2013, 09:52 PM

That certainly does not make things easier for me.
In any case your help is much appreciated!

**silkiechicken** · 04-03-2013, 10:58 AM

I had a sequence duplication of like 90% once with mouse tissues... to fix it we now do library size selection after adapter ligation. Good luck.

**Tobikenobi** · 04-03-2013, 04:14 PM

Originally posted by silkiechicken View Post

I had a sequence duplication of like 90% once with mouse tissues... to fix it we now do library size selection after adapter ligation. Good luck.

Hi!
Could you please explain that in more detail?
What was the size of your libraries before and after the adapters were ligated and which size did you purify?
How much starting material did you use?

Thank you very much!

Tobias

**silkiechicken** · 04-03-2013, 04:33 PM

So I was doing a ChIP-seq with embryonic tissues dissected from mouse. Samples were fixed and sonicated to fragment sizes between 200-500bp.

These samples were then IP'ed and we were able to recover about 15ng of total DNA from about 500ug of starting chromatin. When we had our 15ng of ChIPed DNA.

When using the illumina tru-seq kits as described, for the input and chip libraries, we had a low diversity and over 90% repeat reads randomly distributed, ie not adapter dimers and not from the IP. This was after bioanalyzer results verified that our resulting product was indeed centered around about 275bp. Second round, we requested the gel size selection to be after the amplification and adapter ligation. This resulted in a similar bioanalyzer result, and when ran on the sequencer, gave us only about 10% non unique reads.

Does that make more sense? I can be rather confusing.

**Tobikenobi** · 04-03-2013, 04:42 PM

That makes it very clear!
Thank you very much for your input!

**Tobikenobi** · 04-03-2013, 04:56 PM

actually still confused

Sorry, I think I still dont get it. I just wen t back to the Illumina Truseq DNA protocol, and , if I understand correctly, the gel excision step here is after adapter ligation. How does this differ from you protocol?

**silkiechicken** · 04-03-2013, 05:20 PM

We did the gel extraction as the very last step, so after ligation and pcr amplification. Our guess is we lost too much DNA during gel purification thus resulting in amplification of a small subset of our sample.

eta: We didn't gel extract twice, we just moved it to the very last step.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 57 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 51 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 56 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Fastqc on Chip-Seq library: confusion

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News