Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • mpileup with -r option reports different number of reads at a site depending on regio

    Hi,

    I am using mpileup (samtools v0.1.16 (r963:234)) with the -r option to specify the size of a region for which the pileup should be generated. Thereby, I noticed that the number of reads covering a position of interest changes with increasing size of the specified region.

    For example if I'm interested in the bases and qualities at position 119121582 on chromosome 12 I get for

    a single position:
    Code:
    samtools mpileup -f reference-genome.fa -r chr12:11912[COLOR="DarkSlateGray"]1582[/COLOR]-11912[COLOR="DarkSlateGray"]1582[/COLOR] input.bam | grep 119121582 | awk '{print $4}'
    > 8007 reads covering that site


    a region of 10 bases:
    Code:
    samtools mpileup -f reference-genome.fa -r chr12:11912[COLOR="DarkSlateGray"]1580[/COLOR]-11912[COLOR="DarkSlateGray"]1590[/COLOR] input.bam | grep 119121582 | awk '{print $4}'
    > 7584 reads covering that site


    region of 1000 bases:
    Code:
    samtools mpileup -f reference-genome.fa -r chr12:11912[COLOR="DarkSlateGray"]1000[/COLOR]-11912[COLOR="DarkSlateGray"]2000[/COLOR] input.bam | grep 119121582 | awk '{print $4}'
    > 6661 reads covering that site

    This ultimately also means that when running mpileup without the -r option on the whole bam file, 6661 reads at that site are being reported while in total 8007 seem to cover it.
    I would have assumed that no matter which size of a region I choose around a site, this particular site will always be covered by the same number of reads which are reported in the pileup format. Could anyone explain to me why this is not the case?

    Thanks a lot in advance

  • #2
    Hi rjp,

    The original author(s) may give you the definite answer. It may have something to do with whether to include reads only partially aligned to the specified region. You may try 100 bases and 500 bases. My guess is that once the region gets big enough (say twice of the read length), the read count will stay the same. Give it a try and let us know!

    Douglas

    Comment


    • #3
      That is because of BAQ and a technical issue when loading the reference. If you just do counting, use "-ABQ0" or better the "depth" command.

      Comment


      • #4
        I tried -ABQ0 and also the depth command, but both give me the same results as described above. Can it be that there is something wrong with my bam file?

        I'm interested in the nucleotides and qualities at a position so only having the depth is not enough.
        Last edited by rjp; 06-09-2011, 10:38 PM.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Recent Advances in Sequencing Analysis Tools
          by seqadmin


          The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
          05-06-2024, 07:48 AM
        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 05-07-2024, 06:57 AM
        0 responses
        12 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-06-2024, 07:17 AM
        0 responses
        16 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-02-2024, 08:06 AM
        0 responses
        22 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-30-2024, 12:17 PM
        0 responses
        24 views
        0 likes
        Last Post seqadmin  
        Working...
        X