Syndicated from PubMed RSS Feeds
False positive peaks in ChIP-seq and other sequencing-based functional assays caused by unannotated high copy number regions.
Bioinformatics. 2011 Jun 19;
Authors: Pickrell JK, Gaffney DJ, Gilad Y, Pritchard JK
MOTIVATION: Sequencing-based assays such as ChIP-seq, DNase-seq, and MNase-seq have become important tools for genome annotation. In these assays, short sequence reads enriched for loci of interest are mapped to a reference genome to determine their origin. Here, we consider whether false positive peak calls can be caused by particular type of error in the reference genome: multi-copy sequences which have been incorrectly assembled and collapsed into a single copy. RESULTS: Using sequencing data from the 1000 Genomes project, we systematically scanned the human genome for regions of high sequencing depth. These regions are highly enriched for erroneously inferred transcription factor binding sites, positions of nucleosomes, and regions of open chromatin. We suggest a simple masking procedure to remove these regions and reduce false positive calls. AVAILABILITY: Files for masking out these regions are available at eqtl.uchicago.edu CONTACT: [email protected] (J. Pickrell), [email protected] (DJG), [email protected] (YG), [email protected] (J. Pritchard).
PMID: 21690102 [PubMed - as supplied by publisher]
More...
False positive peaks in ChIP-seq and other sequencing-based functional assays caused by unannotated high copy number regions.
Bioinformatics. 2011 Jun 19;
Authors: Pickrell JK, Gaffney DJ, Gilad Y, Pritchard JK
MOTIVATION: Sequencing-based assays such as ChIP-seq, DNase-seq, and MNase-seq have become important tools for genome annotation. In these assays, short sequence reads enriched for loci of interest are mapped to a reference genome to determine their origin. Here, we consider whether false positive peak calls can be caused by particular type of error in the reference genome: multi-copy sequences which have been incorrectly assembled and collapsed into a single copy. RESULTS: Using sequencing data from the 1000 Genomes project, we systematically scanned the human genome for regions of high sequencing depth. These regions are highly enriched for erroneously inferred transcription factor binding sites, positions of nucleosomes, and regions of open chromatin. We suggest a simple masking procedure to remove these regions and reduce false positive calls. AVAILABILITY: Files for masking out these regions are available at eqtl.uchicago.edu CONTACT: [email protected] (J. Pickrell), [email protected] (DJG), [email protected] (YG), [email protected] (J. Pritchard).
PMID: 21690102 [PubMed - as supplied by publisher]
More...