Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • question about mpileup

    Hi all ! My first post on this web site witch helped me a lot in many times before !!

    I got a question about mpileup, i took an exemple from : http://samtools.sourceforge.net/pileup.shtml :

    seq2 156 A 11 .$......+2AG.+2AG.+2AGGG <975;:<<<<<

    If i understood right the column 4 represent the the number of reads covering the site but in reality there is not 11 but 14 reads covering in reality. I mean if you are looking for mutation like INDEL on this position after filtering by covering you could miss some interesting mutation site isn't it ?

    My second question : i saw that .^] in read bases (column 5) and in all the cases it correspond of the start of the reads is it normal ? (there is no base before then why there is a quality ? )

    Thanks

  • #2
    looking at the example you posted, the 4th column is the number of reads, in this case 11. I'm not sure what you mean about 14 reads in reality. I don't see that anywhere in the example, but I only skimmed. Maybe I missed something.

    To your second question: The mapping quality referred to by the ] character is not a base call quality score, it represents the mapping quality, which is a measure of how well the read aligns/matches the reference. I think this link explains it better than I can: https://www.biostars.org/p/8371/

    Comment


    • #3
      Thank you for the answer. I can explain with more details :
      Always in this case

      seq2 156 A 11 .$......+2AG.+2AG.+2AGGG <975;:<<<<<

      We have have : 9 . + 2 G = 11
      But the 3 reads with the insert of AG are not recorded, that is why i m saying there is 14 reads covering this position in reality . I m asking this question because if any variant caller use this column 4 for filtering the covering depth of the position it can t find INDELS ...

      Ok i understand better the meaning of the ^ thank you again !

      Comment


      • #4
        My understanding of the base read format is that the insertions of AG exists on three of the 11 reads that have already been counted-- the insertion is between this reference position (156) and the next position in the reference sequence (157). The reads on which those insertions appear are counted here, in the example they are all matches to the reference-- dots. I read them as ".+2AG" which is a match to the reference, plus a 2bp insertion consisting of AG. Treating that read as "." and "+2AG" is double counting it.

        If you're saying that filtering on read depth would potentially cause you to toss out some indels, I'd agree with that statement. Filtering can always cost you the ability to see a novel variant, but it's the tradeoff for less noisy data. That's not the same as the variant caller not being able to find indels at all, though.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 11:49 AM
        0 responses
        15 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-24-2024, 08:47 AM
        0 responses
        16 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        62 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        60 views
        0 likes
        Last Post seqadmin  
        Working...
        X