Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • mpileup and multiple input files

    Hi All,

    Is there any reason why the order in which .bam files are given to mpileup would affect the number of sites in the mpileup file?

    I have 8 .bam files and I need to have an mpileup file with the samples in a particular order in the columns. I have run mpileup twice with the following settings:

    Code:
    samtools mpileup -d 1000000 -I -f in_genome.fasta 1st.bam 2nd.bam ... > out.mpileup
    When I run it again with a different order of the input .bam files I get a massively different mpileup file that then affects how many SNPs are called/analysed downstream.

    I am using samtools version 0.1.19

  • #2
    I've no idea, but you should use samtools-1.2 as 0.1.19 is pretty ancient now.

    Comment


    • #3
      Thanks jkbonfield. I'll try updating in case it's just a version issue. Still seems like odd behaviour though.

      Comment


      • #4
        I have now tried with the newer version of samtools (v. 1.2). This at least gives me mpileup files that are the same size. Hopefully this was just a weird version bug. We'll see what the downstream analyses turn up.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Today, 11:49 AM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 08:47 AM
        0 responses
        16 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        61 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        60 views
        0 likes
        Last Post seqadmin  
        Working...
        X