Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • mpileup with -r option reports different number of reads at a site depending on regio

    Hi,

    I am using mpileup (samtools v0.1.16 (r963:234)) with the -r option to specify the size of a region for which the pileup should be generated. Thereby, I noticed that the number of reads covering a position of interest changes with increasing size of the specified region.

    For example if I'm interested in the bases and qualities at position 119121582 on chromosome 12 I get for

    a single position:
    Code:
    samtools mpileup -f reference-genome.fa -r chr12:11912[COLOR="DarkSlateGray"]1582[/COLOR]-11912[COLOR="DarkSlateGray"]1582[/COLOR] input.bam | grep 119121582 | awk '{print $4}'
    > 8007 reads covering that site


    a region of 10 bases:
    Code:
    samtools mpileup -f reference-genome.fa -r chr12:11912[COLOR="DarkSlateGray"]1580[/COLOR]-11912[COLOR="DarkSlateGray"]1590[/COLOR] input.bam | grep 119121582 | awk '{print $4}'
    > 7584 reads covering that site


    region of 1000 bases:
    Code:
    samtools mpileup -f reference-genome.fa -r chr12:11912[COLOR="DarkSlateGray"]1000[/COLOR]-11912[COLOR="DarkSlateGray"]2000[/COLOR] input.bam | grep 119121582 | awk '{print $4}'
    > 6661 reads covering that site

    This ultimately also means that when running mpileup without the -r option on the whole bam file, 6661 reads at that site are being reported while in total 8007 seem to cover it.
    I would have assumed that no matter which size of a region I choose around a site, this particular site will always be covered by the same number of reads which are reported in the pileup format. Could anyone explain to me why this is not the case?

    Thanks a lot in advance

  • #2
    Hi rjp,

    The original author(s) may give you the definite answer. It may have something to do with whether to include reads only partially aligned to the specified region. You may try 100 bases and 500 bases. My guess is that once the region gets big enough (say twice of the read length), the read count will stay the same. Give it a try and let us know!

    Douglas

    Comment


    • #3
      That is because of BAQ and a technical issue when loading the reference. If you just do counting, use "-ABQ0" or better the "depth" command.

      Comment


      • #4
        I tried -ABQ0 and also the depth command, but both give me the same results as described above. Can it be that there is something wrong with my bam file?

        I'm interested in the nucleotides and qualities at a position so only having the depth is not enough.
        Last edited by rjp; 06-09-2011, 10:38 PM.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 11:49 AM
        0 responses
        15 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-24-2024, 08:47 AM
        0 responses
        16 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        61 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        60 views
        0 likes
        Last Post seqadmin  
        Working...
        X