Seqanswers Leaderboard Ad

**mgolo** · 07-20-2011, 03:44 AM

Hi!

I have the same question Saurabh. I have 3 biological replicates and i would like to normalize them before i merge their pileup files. Does anyone have a clue for this?

Thanks in advance!

Maria

**gringer** · 07-21-2011, 03:16 AM

For a basic merge, 'samtools mpileup ' will do this:

Multisample SNP Calling

http://samtools.sourceforge.net/mpileup.shtml

It displays different columns for each sample; I'm not sure if that is what you want.

**mgolo** · 07-21-2011, 03:42 AM

Originally posted by gringer View Post

For a basic merge, 'samtools mpileup ' will do this:

Multisample SNP Calling

http://samtools.sourceforge.net/mpileup.shtml

It displays different columns for each sample; I'm not sure if that is what you want.

Hi! Thanks for your fast reply

I don´t think this is what i need. I would like to merge 3 biological replicates' results into one single file. Basically for each count calculate the mean of the 3 runs. But before doing that i need to normalize somehow the 3 files, as they have different coverages. Maybe i have to do this before creating the pileup files... Any idea?

**gringer** · 07-21-2011, 05:08 AM

I don't think this is what i need. I would like to merge 3 biological replicates' results into one single file. Basically for each count calculate the mean of the 3 runs. But before doing that i need to normalize somehow the 3 files, as they have different coverages. Maybe i have to do this before creating the pileup files... Any idea?

mpileup is the only hammer I know (I'm very new to this), so take this with a grain of salt....

mpileup gives raw count data for each run, so you can extract those columns as your raw coverages per sample. I would then normalise by dividing by some statistic from the counts per column (e.g. 75th percentile). Here's a quick R script that does this:

Code:

# read in pileup data
data.df <- read.delim("mpileup_lane1-6.csv", sep = "\t", header = FALSE,
                      col.names = c("isoform","pos","flag",paste(c("count","seq","qual"),rep(1:6,each=3),sep="_")));
pos.counts <- cbind(isoform = data.df$isoform,pos = data.df$pos,data.df[,paste("count",1:6,sep="_")]);
# rough normalisation of count data
quantile.counts <- apply(pos.counts[,3:8],2,quantile, p=0.75);
quantile.counts <- quantile.counts / min(quantile.counts);
pos.counts[,3:8] <- t(t(pos.counts[,3:8]) / quantile.counts);
pos.counts$mean <- apply(pos.counts[,3:8],1,mean);

A more complex *PKM-style normalisation would need to take into account the transcript sizes for each hit, so you'd use something like cufflinks/DEseq/whatever to get *PKM values for each transcript, then divide by that value. I'm not sure if going that far would be necessary, given that they should be hitting transcripts with a similar relative frequency.

**mgolo** · 07-21-2011, 06:10 AM

Thank you gringer!

I think you are right about the normalization, no need to take into account transcript sizes

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 37 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 41 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 35 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

tool to merge pileup

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News