![]() |
|
|||||||
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| samtools merge | bair | Bioinformatics | 5 | 10-10-2012 12:51 PM |
| samtools merge | frymor | Bioinformatics | 4 | 10-26-2011 04:12 AM |
| Samtools merge | wangzkai | Bioinformatics | 1 | 05-01-2010 12:35 PM |
| samtools merge | bair | Bioinformatics | 4 | 03-05-2010 12:23 AM |
| maq merge | Layla | Bioinformatics | 0 | 05-28-2009 06:06 AM |
![]() |
|
|
Thread Tools |
|
|
#1 |
|
Member
Location: Rochester Join Date: Jul 2010
Posts: 11
|
hi
I need to merge pileup from multiple samples, do we have an open source tool to merge multiple pileups. Thanks Saurabh |
|
|
|
|
|
#2 |
|
Member
Location: Denmark Join Date: Apr 2011
Posts: 10
|
Hi!
I have the same question Saurabh. I have 3 biological replicates and i would like to normalize them before i merge their pileup files. Does anyone have a clue for this? Thanks in advance! Maria |
|
|
|
|
|
#3 |
|
David Eccles (gringer)
Location: Wellington, New Zealand Join Date: May 2011
Posts: 289
|
For a basic merge, 'samtools mpileup ' will do this:
http://samtools.sourceforge.net/mpileup.shtml It displays different columns for each sample; I'm not sure if that is what you want. |
|
|
|
|
|
#4 | |
|
Member
Location: Denmark Join Date: Apr 2011
Posts: 10
|
Quote:
![]() I donīt think this is what i need. I would like to merge 3 biological replicates' results into one single file. Basically for each count calculate the mean of the 3 runs. But before doing that i need to normalize somehow the 3 files, as they have different coverages. Maybe i have to do this before creating the pileup files... Any idea? |
|
|
|
|
|
|
#5 | |
|
David Eccles (gringer)
Location: Wellington, New Zealand Join Date: May 2011
Posts: 289
|
Quote:
mpileup gives raw count data for each run, so you can extract those columns as your raw coverages per sample. I would then normalise by dividing by some statistic from the counts per column (e.g. 75th percentile). Here's a quick R script that does this: Code:
# read in pileup data
data.df <- read.delim("mpileup_lane1-6.csv", sep = "\t", header = FALSE,
col.names = c("isoform","pos","flag",paste(c("count","seq","qual"),rep(1:6,each=3),sep="_")));
pos.counts <- cbind(isoform = data.df$isoform,pos = data.df$pos,data.df[,paste("count",1:6,sep="_")]);
# rough normalisation of count data
quantile.counts <- apply(pos.counts[,3:8],2,quantile, p=0.75);
quantile.counts <- quantile.counts / min(quantile.counts);
pos.counts[,3:8] <- t(t(pos.counts[,3:8]) / quantile.counts);
pos.counts$mean <- apply(pos.counts[,3:8],1,mean);
|
|
|
|
|
|
|
#6 |
|
Member
Location: Denmark Join Date: Apr 2011
Posts: 10
|
Thank you gringer!
I think you are right about the normalization, no need to take into account transcript sizes
|
|
|
|
![]() |
| Tags |
| ngs, pileup, sam tools |
| Thread Tools | |
|
|