In many of the experiments our lab is doing with Illumina reads, we always seem to end up with the task of normalizing data.
If I have 3 experimental conditions, I've sequenced a lane for each and there needs to be a way to compare counts of my known RNA in mirbase to my sequence reads mapped to the genome (with MAQ / novoalign).
I've read about people doing counts as reads per million and log transforming these values to fit Poisson distribution, but it's sprung multiple ideas in my mind. Would this be as simple as dividing my counts for each experiment by
1) 1 Million
2) the total number of reads sequenced
3) the total number of uniquely mapped reads
I'm inclined to option (3) because that represents the amount of usable sequence data.
I'm just wondering if anybody has a more intelligent way of tackling this problem with nextgen data or perhaps there's some software to help out.
I have :
a) Alignment locations of all reads on a ref. genome for each experiment
b) Location of my reference RNAs on the same genome
I am already able to count the number of overlapping locations with each reference RNA in each experiment, and that gives me raw counts.
I have about 4 experiments, but this varies from study to study.
If I have 3 experimental conditions, I've sequenced a lane for each and there needs to be a way to compare counts of my known RNA in mirbase to my sequence reads mapped to the genome (with MAQ / novoalign).
I've read about people doing counts as reads per million and log transforming these values to fit Poisson distribution, but it's sprung multiple ideas in my mind. Would this be as simple as dividing my counts for each experiment by
1) 1 Million
2) the total number of reads sequenced
3) the total number of uniquely mapped reads
I'm inclined to option (3) because that represents the amount of usable sequence data.
I'm just wondering if anybody has a more intelligent way of tackling this problem with nextgen data or perhaps there's some software to help out.
I have :
a) Alignment locations of all reads on a ref. genome for each experiment
b) Location of my reference RNAs on the same genome
I am already able to count the number of overlapping locations with each reference RNA in each experiment, and that gives me raw counts.
I have about 4 experiments, but this varies from study to study.
Comment