I am new to bioinformatics, coming from a applied math background. I need some help understanding a couple of files and interpreting their data.
I had Chip-Seq data from a paper I read. The data and abstract is on GEO.
As you can see its got 22 "samples" which if I interpret correctly means the data from the experiment. These files are peak files from the Chip-Seq experiment for particular histone modifications. If we pick one, lets say GSM721288 H3K4me1_MB this file is supposed to tell me the peaks of the chip seq experiment. In other words, its supposed to tell me where the genome is enriched for this particular modification.
If I look at the BED file, it has lines like this:
This obviously dosnt show me the peak. It dosnt show me how high the peak was or the number of tags/reads in the peak. I have two questions:
1) Does the line chr1 4696042 4696748 imply that between those nucleosomes (706bp) of them, they all have this histone modification? If so, do I have any knowledge of the number of reads in this region? Would I have to go back to the raw data for this?
2) OR does the data mean that EACH LINE is a read. If so, should I see overlapping regions? Furthermore, taking the first line, how can a read be 706 bp long?
If anyone could explain to me a simple workflow to what happens after a chip-seq experiment and what exactly do the BED files mean.
Thanks.
I had Chip-Seq data from a paper I read. The data and abstract is on GEO.
As you can see its got 22 "samples" which if I interpret correctly means the data from the experiment. These files are peak files from the Chip-Seq experiment for particular histone modifications. If we pick one, lets say GSM721288 H3K4me1_MB this file is supposed to tell me the peaks of the chip seq experiment. In other words, its supposed to tell me where the genome is enriched for this particular modification.
If I look at the BED file, it has lines like this:
Code:
chr1 4696042 4696748 chr1 4735272 4735958 chr1 4736192 4736368 chr1 4736438 4736693
1) Does the line chr1 4696042 4696748 imply that between those nucleosomes (706bp) of them, they all have this histone modification? If so, do I have any knowledge of the number of reads in this region? Would I have to go back to the raw data for this?
2) OR does the data mean that EACH LINE is a read. If so, should I see overlapping regions? Furthermore, taking the first line, how can a read be 706 bp long?
If anyone could explain to me a simple workflow to what happens after a chip-seq experiment and what exactly do the BED files mean.
Thanks.
Comment