Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • the pileup result from samtools doesn't mach to read data

    Hi,

    I am trying to obtain base alleles for a specific position (polymorphic location) directly from read data. I also use samtools pileup to obtain bases information. And, the results are not matched.

    Here is an example (the data is from 1000 genome project...low coverage). I look at the sample NA12716 and at the position 154402806.
    These are reads data covering that position. The base at 154402806 is shown in red:

    ERR000573.12285810 pPr1 X 154402756 60 9S42M = 154402571 -226
    GCGAAGAAGTCAATTAGAAAGTCTTTTCAAGTTATCCAAGCAGGAGGTCTC
    27=AA=8A;<?A?@>@>@@@=;>=<>>?A@>@>@:?>AA>?@>??>>=?9?
    X0:i:1 X1:i:0 XC:i:42MD:Z:42 RG:Z:ERR000573 AM:i:37 NM:i:0 SM:i:37 MQ:i:60 XT:A:U
    BQ:Z:@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

    ERR000569.10381384 pPr1 X 154402769 60 51M = 154402592 -227
    CTTTTCAAGTTATCCAAGCAGGAGGTCTCAAGTGGCCTGGTCTAGAGTAGT
    >6?=8>9>=;1<<:7A@=>$$=@><=%::@@=>;6>><==;>=@>?8>@>?
    X0:i:1 X1:i:0 MD:Z:37C13 RG:Z:ERR000569 AM:i:37 NM:i:1 SM:i:37 MQ:i:60
    XT:A:U
    BQ:Z:@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

    ERR000569.428911 pPR2 X 154402785 60 35M16S = 154402963 214
    AGCAGGAGGTCTCAAGTGGCCTGGTCTAGAGTAGTGACAGTGGACATTTAA
    ;??@?=>=;;=@>>:<%=;=>@>>>:@%:::?.99=8=?9$<34=6%<#79
    X0:i:1 X1:i:0 XC:i:35MD:Z:21C13 RG:Z:ERR000569 AM:i:37 NM:i:1 SM:i:37 MQ:i:60 XT:A:U
    BQ:Z:@@@@@@@@@@@@@@@@@@@@AC@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

    ERR000572.854678 pPr2 X 154402794 60 15S36M = 154402599 -230
    TATCCAAGCAGGAGGTCTCAAGTGGCCTGGTCTAGAGTAGTGACAGTGGAC
    ?A75:@A7:=?:A?>;<>?=@:?>===@>=>=>?;@<?@:@=>?@>A?>A<
    X0:i:1 X1:i:1 XC:i:36MD:Z:12C23 RG:Z:ERR000572 AM:i:23 NM:i:1 SM:i:23 MQ:i:60 XT:A:U
    BQ:Z:@@@@@@@@@@@@@@@@@@@@@@@@@@@C@@@@@@@@@@@@@@@@@@@@@@@

    ERR000572.8002077 pPr1 X 154402804 60 15S36M = 154402586 -253
    GGAGGTCTCACGTGGCCTGGTCTAGAGTAGTGACAGTGGACATTGAAAGAA
    =:B<;<;>3?%;=?93=@>=>;>@>@<>@=@==@@=?>>9>@@@=AAA>A?
    X0:i:1 X1:i:0 XC:i:36MD:Z:2C33 RG:Z:ERR000572 AM:i:37 NM:i:1 SM:i:37 MQ:i:60 XT:A:U
    BQ:Z:@@@@@@@@@@@@@@@@@C@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@CD

    ERR000574.8852864 pPR2 X 154402806 60 39M12S = 154402984 222
    TGGTCTAGAGTAGTGACAGTGGACATTGAAAGAAAGAGATTTCAGAGATAC
    =@???@>>>>?>>?>?=A>?>=>=@?@3#@?<8@@<>=:?@A?9<>=>A?<
    X0:i:1 X1:i:0 XC:i:39MD:Z:0C38 RG:Z:ERR000574 AM:i:37 NM:i:1 SM:i:37 MQ:i:60 XT:A:U
    BQ:Z:WJ@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

    And the results from pileup is:
    X 154402806 C T 9 67 60 5 tTtt^]T <===&

    So, from the reads data, bases at position 154402806 should be CTTAAT(three reference alleles), but the results from pileup is TTTTT (all are reference alleles). Also, there only 5 reads from pileup results, but it supposes to have 6 reads coverage.

    I checked the default read filter in samtools:
    minimum mapping quality for an alignment to be used [0]
    minmum base quality for a base to be considered [13]

    The base quality of these bases CTTAAT are
    63
    60
    64
    65
    66
    61

    Does anyone have any idea what is going wrong?

    Thank you very much!!

    Anney

  • #2
    Okay, here's your first read, and its cigar string:

    GCGAAGAAGTCAATTAGAAAGTCTTTTCAAGTTATCCAAGCAGGAGGTCTC

    9S42M

    That means the first 9 bases were soft-clipped, then the next 42 match. That means that that last base isn't being counted.

    GGAGGTCTCACGTGGCCTGGTCTAGAGTAGTGACAGTGGACATTGAAAGAA

    15S36M

    The first 15 bases are soft-clipped, so there goes the A. I think you'll find that the alternate bases just aren't being counted, because they aren't deemed as matching well enough.

    Comment


    • #3
      this is your alignment:

      gcgaagaagTCAATTAGAAAGTCTTTTCAAGTTATCCAAGCAGGAGGTCTC-----------------------------------------------------------
      ----------------------CTTTTCAAGTTATCCAAGCAGGAGGTCTCAAGTGGCCTGGTCTAGAGTAGT-------------------------------------
      --------------------------------------AGCAGGAGGTCTCAAGTGGCCTGGTCTAGAGTAGTgacagtggacatttaa---------------------
      --------------------------------tatccaagcaggaggTCTCAAGTGGCCTGGTCTAGAGTAGTGACAGTGGAC---------------------------
      ------------------------------------------ggaggtctcacgtggCCTGGTCTAGAGTAGTGACAGTGGACATTGAAAGAA-----------------
      -----------------------------------------------------------TGGTCTAGAGTAGTGACAGTGGACATTGAAAGAAAGAGAtttcagagatac


      soft clipping actually changes the mapping position.
      but i dont understand the base qualities given in the pileup.

      Comment


      • #4
        Thank you very much for the reply!! This helps me a lot

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin


          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
          Today, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        37 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        40 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        35 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        54 views
        0 likes
        Last Post seqadmin  
        Working...
        X