Dear All,
Our lab used ChIP-seq to study a histone variant that is expected to occurs everywhere in the human genome. I found two problems with our ChIP-seq dataset but could not figure out why they happened.
- I used Picard to mark duplicates and found that the duplicate percentage is 66%. I think this is so high. Do you know what is acceptable duplicate level in ChIP-seq data?
- The GC content of our dataset is 56% - much higher than the GC content of the reference genome. However, this is not explained by the duplication problem since the GC content of this dataset after removing duplicates does not decrease. I saw a post saying that Illumina prefer to sequencing higher GC content region. I wonder if Illumina have already fixed this bias?
I would very much appreciate if you could give some possible reasons for these two problems.
Many thanks,
Nguyen
Our lab used ChIP-seq to study a histone variant that is expected to occurs everywhere in the human genome. I found two problems with our ChIP-seq dataset but could not figure out why they happened.
- I used Picard to mark duplicates and found that the duplicate percentage is 66%. I think this is so high. Do you know what is acceptable duplicate level in ChIP-seq data?
- The GC content of our dataset is 56% - much higher than the GC content of the reference genome. However, this is not explained by the duplication problem since the GC content of this dataset after removing duplicates does not decrease. I saw a post saying that Illumina prefer to sequencing higher GC content region. I wonder if Illumina have already fixed this bias?
I would very much appreciate if you could give some possible reasons for these two problems.
Many thanks,
Nguyen