Unconfigured Ad

**mgolo** · 07-20-2011, 03:44 AM

Hi!

I have the same question Saurabh. I have 3 biological replicates and i would like to normalize them before i merge their pileup files. Does anyone have a clue for this?

Thanks in advance!

Maria

**gringer** · 07-21-2011, 03:16 AM

For a basic merge, 'samtools mpileup ' will do this:

Multisample SNP Calling

http://samtools.sourceforge.net/mpileup.shtml

It displays different columns for each sample; I'm not sure if that is what you want.

**mgolo** · 07-21-2011, 03:42 AM

Originally posted by gringer View Post

For a basic merge, 'samtools mpileup ' will do this:

Multisample SNP Calling

http://samtools.sourceforge.net/mpileup.shtml

It displays different columns for each sample; I'm not sure if that is what you want.

Hi! Thanks for your fast reply

I don´t think this is what i need. I would like to merge 3 biological replicates' results into one single file. Basically for each count calculate the mean of the 3 runs. But before doing that i need to normalize somehow the 3 files, as they have different coverages. Maybe i have to do this before creating the pileup files... Any idea?

**gringer** · 07-21-2011, 05:08 AM

I don't think this is what i need. I would like to merge 3 biological replicates' results into one single file. Basically for each count calculate the mean of the 3 runs. But before doing that i need to normalize somehow the 3 files, as they have different coverages. Maybe i have to do this before creating the pileup files... Any idea?

mpileup is the only hammer I know (I'm very new to this), so take this with a grain of salt....

mpileup gives raw count data for each run, so you can extract those columns as your raw coverages per sample. I would then normalise by dividing by some statistic from the counts per column (e.g. 75th percentile). Here's a quick R script that does this:

Code:

# read in pileup data
data.df <- read.delim("mpileup_lane1-6.csv", sep = "\t", header = FALSE,
                      col.names = c("isoform","pos","flag",paste(c("count","seq","qual"),rep(1:6,each=3),sep="_")));
pos.counts <- cbind(isoform = data.df$isoform,pos = data.df$pos,data.df[,paste("count",1:6,sep="_")]);
# rough normalisation of count data
quantile.counts <- apply(pos.counts[,3:8],2,quantile, p=0.75);
quantile.counts <- quantile.counts / min(quantile.counts);
pos.counts[,3:8] <- t(t(pos.counts[,3:8]) / quantile.counts);
pos.counts$mean <- apply(pos.counts[,3:8],1,mean);

A more complex *PKM-style normalisation would need to take into account the transcript sizes for each hit, so you'd use something like cufflinks/DEseq/whatever to get *PKM values for each transcript, then divide by that value. I'm not sure if going that far would be necessary, given that they should be hitting transcripts with a similar relative frequency.

**mgolo** · 07-21-2011, 06:10 AM

Thank you gringer!

I think you are right about the normalization, no need to take into account transcript sizes

Topics	Statistics	Last Post
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 24 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 41 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 48 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 49 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM

Unconfigured Ad

tool to merge pileup

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News