Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • mpileup with & without -q: odd result?

    Hello everyone,

    I'm running mpileup twice on the same BAM alignment of RNASeq reads against the same reference genome, with and without filtering for mapping quality (-q 5 or not), and getting, (I think) an odd result.

    I checked the same position on the same chromosome between the filtered and unfiltered output.
    In both cases, 7997 reads overlapped that position, but the counts of reads that matched the reference (,.) versus those that were mismatches (Aa) was drastically different between the two. Specifically, there were far more mismatches at that position in the filtered output than in the unfiltered output.

    If I'm understanding what -q does correctly, it should simply be disregarding alignments with a mapq score of lower than 5 (if I use -q 5), right? So I would understand if the number of reads overlapping the position was reduced, but I don't understand why the count of overlapping reads would remain the same, but the proportion of matches to mismatches would change.

    Can anyone help me make sense of this?

    Thanks,
    Alex

    P.S.
    samtools 0.1.18
    Linux OS

  • #2
    Samtools has a maximum depth cut-off in order to limit the amount of memory required, which leads to some odd behavior when you hit that max. I think the max is 8,000 and it looks like you are close to that. Try supplying the -d flag with a really large number (like 10 million), and see if the results are more what you expect.

    Justin

    Comment


    • #3
      That seems to have done the trick. Thanks.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      27 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      31 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      27 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      52 views
      0 likes
      Last Post seqadmin  
      Working...
      X