Hola!
I was wandering if there's any command line tools (bedtools, samtools) other than R packages, to extend the reads from the base length b/w 36-50 bp to 200bp strand specific. I dont want to use R, as I have to load the file in the environment which is time consuming, I can make awk script easily, but there might be something present already. I want to do it because if I generate a coverage(bedGraph file) directly collapsing the bam file, I get small continuous mountains, which are more or less the referring to single gene target. (Look at the attached picture)
Also, another question is, does the chip-seq data should be normalized (same number of reads in control and sample) before calling peaks in Macs. For me, the mockIP control always less than 2/3rd the number of reads of control. I did a test, the number of positive peaks went down by 30-40% and -ve peaks came up by 30% after using Macs on normalized data.
Thanks
Sukhi
I was wandering if there's any command line tools (bedtools, samtools) other than R packages, to extend the reads from the base length b/w 36-50 bp to 200bp strand specific. I dont want to use R, as I have to load the file in the environment which is time consuming, I can make awk script easily, but there might be something present already. I want to do it because if I generate a coverage(bedGraph file) directly collapsing the bam file, I get small continuous mountains, which are more or less the referring to single gene target. (Look at the attached picture)
Also, another question is, does the chip-seq data should be normalized (same number of reads in control and sample) before calling peaks in Macs. For me, the mockIP control always less than 2/3rd the number of reads of control. I did a test, the number of positive peaks went down by 30-40% and -ve peaks came up by 30% after using Macs on normalized data.
Thanks
Sukhi
Comment