SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
samtools merge bair Bioinformatics 5 10-10-2012 12:51 PM
samtools merge frymor Bioinformatics 4 10-26-2011 04:12 AM
Samtools merge wangzkai Bioinformatics 1 05-01-2010 12:35 PM
samtools merge bair Bioinformatics 4 03-05-2010 12:23 AM
maq merge Layla Bioinformatics 0 05-28-2009 06:06 AM

Reply
 
Thread Tools
Old 06-23-2011, 08:24 PM   #1
sbaheti
Member
 
Location: Rochester

Join Date: Jul 2010
Posts: 12
Default tool to merge pileup

hi

I need to merge pileup from multiple samples, do we have an open source tool to merge multiple pileups.

Thanks

Saurabh
sbaheti is offline   Reply With Quote
Old 07-20-2011, 03:44 AM   #2
mgolo
Member
 
Location: Denmark

Join Date: Apr 2011
Posts: 10
Default

Hi!

I have the same question Saurabh. I have 3 biological replicates and i would like to normalize them before i merge their pileup files. Does anyone have a clue for this?

Thanks in advance!

Maria
mgolo is offline   Reply With Quote
Old 07-21-2011, 03:16 AM   #3
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 550
Default

For a basic merge, 'samtools mpileup ' will do this:

http://samtools.sourceforge.net/mpileup.shtml

It displays different columns for each sample; I'm not sure if that is what you want.
gringer is offline   Reply With Quote
Old 07-21-2011, 03:42 AM   #4
mgolo
Member
 
Location: Denmark

Join Date: Apr 2011
Posts: 10
Default

Quote:
Originally Posted by gringer View Post
For a basic merge, 'samtools mpileup ' will do this:

http://samtools.sourceforge.net/mpileup.shtml

It displays different columns for each sample; I'm not sure if that is what you want.
Hi! Thanks for your fast reply

I donīt think this is what i need. I would like to merge 3 biological replicates' results into one single file. Basically for each count calculate the mean of the 3 runs. But before doing that i need to normalize somehow the 3 files, as they have different coverages. Maybe i have to do this before creating the pileup files... Any idea?
mgolo is offline   Reply With Quote
Old 07-21-2011, 05:08 AM   #5
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 550
Default

Quote:
I don't think this is what i need. I would like to merge 3 biological replicates' results into one single file. Basically for each count calculate the mean of the 3 runs. But before doing that i need to normalize somehow the 3 files, as they have different coverages. Maybe i have to do this before creating the pileup files... Any idea?
mpileup is the only hammer I know (I'm very new to this), so take this with a grain of salt....

mpileup gives raw count data for each run, so you can extract those columns as your raw coverages per sample. I would then normalise by dividing by some statistic from the counts per column (e.g. 75th percentile). Here's a quick R script that does this:

Code:
# read in pileup data
data.df <- read.delim("mpileup_lane1-6.csv", sep = "\t", header = FALSE,
                      col.names = c("isoform","pos","flag",paste(c("count","seq","qual"),rep(1:6,each=3),sep="_")));
pos.counts <- cbind(isoform = data.df$isoform,pos = data.df$pos,data.df[,paste("count",1:6,sep="_")]);
# rough normalisation of count data
quantile.counts <- apply(pos.counts[,3:8],2,quantile, p=0.75);
quantile.counts <- quantile.counts / min(quantile.counts);
pos.counts[,3:8] <- t(t(pos.counts[,3:8]) / quantile.counts);
pos.counts$mean <- apply(pos.counts[,3:8],1,mean);
A more complex *PKM-style normalisation would need to take into account the transcript sizes for each hit, so you'd use something like cufflinks/DEseq/whatever to get *PKM values for each transcript, then divide by that value. I'm not sure if going that far would be necessary, given that they should be hitting transcripts with a similar relative frequency.
gringer is offline   Reply With Quote
Old 07-21-2011, 06:10 AM   #6
mgolo
Member
 
Location: Denmark

Join Date: Apr 2011
Posts: 10
Default

Thank you gringer!

I think you are right about the normalization, no need to take into account transcript sizes
mgolo is offline   Reply With Quote
Reply

Tags
ngs, pileup, sam tools

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:05 AM.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.