Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • sbaheti
    Member
    • Jul 2010
    • 12

    tool to merge pileup

    hi

    I need to merge pileup from multiple samples, do we have an open source tool to merge multiple pileups.

    Thanks

    Saurabh
  • mgolo
    Member
    • Apr 2011
    • 10

    #2
    Hi!

    I have the same question Saurabh. I have 3 biological replicates and i would like to normalize them before i merge their pileup files. Does anyone have a clue for this?

    Thanks in advance!

    Maria

    Comment

    • gringer
      David Eccles (gringer)
      • May 2011
      • 845

      #3
      For a basic merge, 'samtools mpileup ' will do this:



      It displays different columns for each sample; I'm not sure if that is what you want.

      Comment

      • mgolo
        Member
        • Apr 2011
        • 10

        #4
        Originally posted by gringer View Post
        For a basic merge, 'samtools mpileup ' will do this:



        It displays different columns for each sample; I'm not sure if that is what you want.
        Hi! Thanks for your fast reply

        I don´t think this is what i need. I would like to merge 3 biological replicates' results into one single file. Basically for each count calculate the mean of the 3 runs. But before doing that i need to normalize somehow the 3 files, as they have different coverages. Maybe i have to do this before creating the pileup files... Any idea?

        Comment

        • gringer
          David Eccles (gringer)
          • May 2011
          • 845

          #5
          I don't think this is what i need. I would like to merge 3 biological replicates' results into one single file. Basically for each count calculate the mean of the 3 runs. But before doing that i need to normalize somehow the 3 files, as they have different coverages. Maybe i have to do this before creating the pileup files... Any idea?
          mpileup is the only hammer I know (I'm very new to this), so take this with a grain of salt....

          mpileup gives raw count data for each run, so you can extract those columns as your raw coverages per sample. I would then normalise by dividing by some statistic from the counts per column (e.g. 75th percentile). Here's a quick R script that does this:

          Code:
          # read in pileup data
          data.df <- read.delim("mpileup_lane1-6.csv", sep = "\t", header = FALSE,
                                col.names = c("isoform","pos","flag",paste(c("count","seq","qual"),rep(1:6,each=3),sep="_")));
          pos.counts <- cbind(isoform = data.df$isoform,pos = data.df$pos,data.df[,paste("count",1:6,sep="_")]);
          # rough normalisation of count data
          quantile.counts <- apply(pos.counts[,3:8],2,quantile, p=0.75);
          quantile.counts <- quantile.counts / min(quantile.counts);
          pos.counts[,3:8] <- t(t(pos.counts[,3:8]) / quantile.counts);
          pos.counts$mean <- apply(pos.counts[,3:8],1,mean);
          A more complex *PKM-style normalisation would need to take into account the transcript sizes for each hit, so you'd use something like cufflinks/DEseq/whatever to get *PKM values for each transcript, then divide by that value. I'm not sure if going that far would be necessary, given that they should be hitting transcripts with a similar relative frequency.

          Comment

          • mgolo
            Member
            • Apr 2011
            • 10

            #6
            Thank you gringer!

            I think you are right about the normalization, no need to take into account transcript sizes

            Comment

            Latest Articles

            Collapse

            • SEQadmin2
              Nine Things a Sample Prep Scientist Thinks About Before Sequencing
              by SEQadmin2


              I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


              Here are nine questions we think about, in roughly the order they matter, before...
              06-18-2026, 07:11 AM
            • SEQadmin2
              From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
              by SEQadmin2


              Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


              The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
              ...
              06-02-2026, 10:05 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by SEQadmin2, 06-17-2026, 06:09 AM
            0 responses
            24 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-09-2026, 11:58 AM
            0 responses
            41 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-05-2026, 10:09 AM
            0 responses
            48 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-04-2026, 08:59 AM
            0 responses
            49 views
            0 reactions
            Last Post SEQadmin2  
            Working...