Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • tool to merge pileup

    hi

    I need to merge pileup from multiple samples, do we have an open source tool to merge multiple pileups.

    Thanks

    Saurabh

  • #2
    Hi!

    I have the same question Saurabh. I have 3 biological replicates and i would like to normalize them before i merge their pileup files. Does anyone have a clue for this?

    Thanks in advance!

    Maria

    Comment


    • #3
      For a basic merge, 'samtools mpileup ' will do this:



      It displays different columns for each sample; I'm not sure if that is what you want.

      Comment


      • #4
        Originally posted by gringer View Post
        For a basic merge, 'samtools mpileup ' will do this:



        It displays different columns for each sample; I'm not sure if that is what you want.
        Hi! Thanks for your fast reply

        I don´t think this is what i need. I would like to merge 3 biological replicates' results into one single file. Basically for each count calculate the mean of the 3 runs. But before doing that i need to normalize somehow the 3 files, as they have different coverages. Maybe i have to do this before creating the pileup files... Any idea?

        Comment


        • #5
          I don't think this is what i need. I would like to merge 3 biological replicates' results into one single file. Basically for each count calculate the mean of the 3 runs. But before doing that i need to normalize somehow the 3 files, as they have different coverages. Maybe i have to do this before creating the pileup files... Any idea?
          mpileup is the only hammer I know (I'm very new to this), so take this with a grain of salt....

          mpileup gives raw count data for each run, so you can extract those columns as your raw coverages per sample. I would then normalise by dividing by some statistic from the counts per column (e.g. 75th percentile). Here's a quick R script that does this:

          Code:
          # read in pileup data
          data.df <- read.delim("mpileup_lane1-6.csv", sep = "\t", header = FALSE,
                                col.names = c("isoform","pos","flag",paste(c("count","seq","qual"),rep(1:6,each=3),sep="_")));
          pos.counts <- cbind(isoform = data.df$isoform,pos = data.df$pos,data.df[,paste("count",1:6,sep="_")]);
          # rough normalisation of count data
          quantile.counts <- apply(pos.counts[,3:8],2,quantile, p=0.75);
          quantile.counts <- quantile.counts / min(quantile.counts);
          pos.counts[,3:8] <- t(t(pos.counts[,3:8]) / quantile.counts);
          pos.counts$mean <- apply(pos.counts[,3:8],1,mean);
          A more complex *PKM-style normalisation would need to take into account the transcript sizes for each hit, so you'd use something like cufflinks/DEseq/whatever to get *PKM values for each transcript, then divide by that value. I'm not sure if going that far would be necessary, given that they should be hitting transcripts with a similar relative frequency.

          Comment


          • #6
            Thank you gringer!

            I think you are right about the normalization, no need to take into account transcript sizes

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin


              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
              Today, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            37 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            41 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            35 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            54 views
            0 likes
            Last Post seqadmin  
            Working...
            X